Solr
  1. Solr
  2. SOLR-2010

Improvements to SpellCheckComponent Collate functionality

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Labels:
      None
    • Environment:

      Tested against trunk revision 966633

      Description

      Improvements to SpellCheckComponent Collate functionality

      Our project requires a better Spell Check Collator. I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.

      1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also). This is especially helpful when there is more than one correction per query. The 1.4 behavior does not verify that a particular combination will actually return hits.
      2. Provide the option to get multiple collation suggestions
      3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.

      This patch is similar to what is described in SOLR-507 item #1. Also, this patch provides a viable workaround for the problem discussed in SOLR-1074. A dictionary could be created that combines the terms from the multiple fields. The collator then would prune out any spurious suggestions this would cause.

      This patch adds the following spellcheck parameters:

      1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. Default is 0, which maintains backwards-compatible behavior (do not check collations).

      2. spellcheck.maxCollations - maximum # of collations to return. Default is 1, which maintains backwards-compatible behavior.

      3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found. default is false, which maintains backwards-compatible behavior. When true, output is like this (in context):

      <lst name="spellcheck">
      <lst name="suggestions">
      <lst name="hopq">
      <int name="numFound">94</int>
      <int name="startOffset">7</int>
      <int name="endOffset">11</int>
      <arr name="suggestion">
      <str>hope</str>
      <str>how</str>
      <str>hope</str>
      <str>chops</str>
      <str>hoped</str>
      etc
      </arr>
      <lst name="faill">
      <int name="numFound">100</int>
      <int name="startOffset">16</int>
      <int name="endOffset">21</int>
      <arr name="suggestion">
      <str>fall</str>
      <str>fails</str>
      <str>fail</str>
      <str>fill</str>
      <str>faith</str>
      <str>all</str>
      etc
      </arr>
      </lst>
      <lst name="collation">
      <str name="collationQuery">Title:(how AND fails)</str>
      <int name="hits">2</int>
      <lst name="misspellingsAndCorrections">
      <str name="hopq">how</str>
      <str name="faill">fails</str>
      </lst>
      </lst>
      <lst name="collation">
      <str name="collationQuery">Title:(hope AND faith)</str>
      <int name="hits">2</int>
      <lst name="misspellingsAndCorrections">
      <str name="hopq">hope</str>
      <str name="faill">faith</str>
      </lst>
      </lst>
      <lst name="collation">
      <str name="collationQuery">Title:(chops AND all)</str>
      <int name="hits">1</int>
      <lst name="misspellingsAndCorrections">
      <str name="hopq">chops</str>
      <str name="faill">all</str>
      </lst>
      </lst>
      </lst>
      </lst>

      In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format. getCollatedResult(), which returns a single String, is retained for backwards-compatibility. Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.

      This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

      1. SOLR-2010.patch
        39 kB
        James Dyer
      2. SOLR-2010.patch
        44 kB
        Grant Ingersoll
      3. SOLR-2010.txt
        68 kB
        James Dyer
      4. SOLR-2010.patch
        71 kB
        James Dyer
      5. SOLR-2010.patch
        70 kB
        James Dyer
      6. SOLR-2010_shardSearchHandler_993538.patch
        70 kB
        James Dyer
      7. SOLR-2010_shardRecombineCollations_993538.patch
        70 kB
        James Dyer
      8. SOLR-2010_shardSearchHandler_999521.patch
        70 kB
        James Dyer
      9. SOLR-2010_shardRecombineCollations_999521.patch
        70 kB
        James Dyer
      10. SOLR-2010_141.patch
        61 kB
        James Dyer
      11. solr_2010_3x.patch
        70 kB
        James Dyer
      12. SOLR-2010_141.patch
        55 kB
        James Dyer
      13. multiple_collations_as_an_array.patch
        7 kB
        James Dyer

        Issue Links

          Activity

          Hide
          James Dyer added a comment -

          Tested against branch version #96633

          Show
          James Dyer added a comment - Tested against branch version #96633
          Hide
          Grant Ingersoll added a comment -

          James, thanks for the patch. At first glance this looks great and I would like to see it incorporated.

          This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

          Perhaps we should just have a simple Search Handler that is QueryComp only, either that or we need a way to easily turn off all components but the query component. That way, we could take advantage of the existing sharding capabilities.

          Show
          Grant Ingersoll added a comment - James, thanks for the patch. At first glance this looks great and I would like to see it incorporated. This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate(). Perhaps we should just have a simple Search Handler that is QueryComp only, either that or we need a way to easily turn off all components but the query component. That way, we could take advantage of the existing sharding capabilities.
          Hide
          Grant Ingersoll added a comment -

          Added license headers

          Show
          Grant Ingersoll added a comment - Added license headers
          Hide
          Tom Phethean added a comment -

          This sounds like a really useful patch, I would definitely like to see it go further as it would be useful for a project I'm currently working on. I have just tried to patch this against 1.4.1 (downloaded today) and got the following errors:

          patching file solr/src/test/org/apache/solr/spelling/SpellPossibilityIteratorTest.java
          patching file solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java
          patching file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java
          Hunk #1 FAILED at 20.
          Hunk #2 FAILED at 103.
          2 out of 2 hunks FAILED – saving rejects to file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java.rej
          patching file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
          Hunk #1 FAILED at 132.
          Hunk #2 FAILED at 361.
          Hunk #3 FAILED at 405.
          Hunk #4 FAILED at 452.
          Hunk #5 FAILED at 466.
          5 out of 5 hunks FAILED – saving rejects to file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java.rej
          patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollation.java
          patching file solr/src/java/org/apache/solr/spelling/PossibilityIterator.java
          patching file solr/src/java/org/apache/solr/spelling/SpellCheckCorrection.java
          patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollator.java
          patching file solr/src/common/org/apache/solr/common/params/SpellingParams.java
          Hunk #1 FAILED at 78.
          1 out of 1 hunk FAILED – saving rejects to file solr/src/common/org/apache/solr/common/params/SpellingParams.java.rej
          patching file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java
          Hunk #1 FAILED at 31.
          Hunk #2 FAILED at 46.
          Hunk #3 FAILED at 77.
          Hunk #4 FAILED at 162.
          4 out of 4 hunks FAILED – saving rejects to file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java.rej

          Show
          Tom Phethean added a comment - This sounds like a really useful patch, I would definitely like to see it go further as it would be useful for a project I'm currently working on. I have just tried to patch this against 1.4.1 (downloaded today) and got the following errors: patching file solr/src/test/org/apache/solr/spelling/SpellPossibilityIteratorTest.java patching file solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java patching file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java Hunk #1 FAILED at 20. Hunk #2 FAILED at 103. 2 out of 2 hunks FAILED – saving rejects to file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java.rej patching file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java Hunk #1 FAILED at 132. Hunk #2 FAILED at 361. Hunk #3 FAILED at 405. Hunk #4 FAILED at 452. Hunk #5 FAILED at 466. 5 out of 5 hunks FAILED – saving rejects to file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java.rej patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollation.java patching file solr/src/java/org/apache/solr/spelling/PossibilityIterator.java patching file solr/src/java/org/apache/solr/spelling/SpellCheckCorrection.java patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollator.java patching file solr/src/common/org/apache/solr/common/params/SpellingParams.java Hunk #1 FAILED at 78. 1 out of 1 hunk FAILED – saving rejects to file solr/src/common/org/apache/solr/common/params/SpellingParams.java.rej patching file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java Hunk #1 FAILED at 31. Hunk #2 FAILED at 46. Hunk #3 FAILED at 77. Hunk #4 FAILED at 162. 4 out of 4 hunks FAILED – saving rejects to file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java.rej
          Hide
          Grant Ingersoll added a comment -

          The patch is currently for trunk. I think it will likely be the case that we work it out for trunk and then backport.

          Show
          Grant Ingersoll added a comment - The patch is currently for trunk. I think it will likely be the case that we work it out for trunk and then backport.
          Hide
          Tom Phethean added a comment -

          Ok, thanks. Do you know if there is a rough timescale on that?

          Show
          Tom Phethean added a comment - Ok, thanks. Do you know if there is a rough timescale on that?
          Hide
          James Dyer added a comment -

          Second version of patch. Updated to trunk rev #986945.

          Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly.

          I added some test scenarios to "DistributedSpellCheckComponentTest" and all pass. However, I am a bit disturbed to find that the test fails if I uncomment the constructor (added with this patch). The constructor simply tells it to test only with 4 shards rather than trying 1 shard, then 2, etc. I found either way the 4-shard test results in the same docs going to the same shards. Yet the results are different. Specifically the ranking/ordering of the collations returned and the # of hits reported are sometimes wrong when the constructor is called before the test. Unfortunately I am at a loss as to why I get inconsistent results here and anyone's assistance on this would be most helpful.

          I also added an additional unit test method to verify this works when multiple request handlers are configured with different "qf" params. I also added a unit test method that verifies this works when "fq" is set.

          Show
          James Dyer added a comment - Second version of patch. Updated to trunk rev #986945. Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly. I added some test scenarios to "DistributedSpellCheckComponentTest" and all pass. However, I am a bit disturbed to find that the test fails if I uncomment the constructor (added with this patch). The constructor simply tells it to test only with 4 shards rather than trying 1 shard, then 2, etc. I found either way the 4-shard test results in the same docs going to the same shards. Yet the results are different. Specifically the ranking/ordering of the collations returned and the # of hits reported are sometimes wrong when the constructor is called before the test. Unfortunately I am at a loss as to why I get inconsistent results here and anyone's assistance on this would be most helpful. I also added an additional unit test method to verify this works when multiple request handlers are configured with different "qf" params. I also added a unit test method that verifies this works when "fq" is set.
          Hide
          Grant Ingersoll added a comment -

          Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly.

          I haven't taken a look at the patch yet, but by the sounds of it, I still think the cleaner way to go is to make Solr have an option to specifically pass in which component to run and turn off all others. This would be useful for other things, too. Then you could just use the existing mechanisms.

          Show
          Grant Ingersoll added a comment - Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly. I haven't taken a look at the patch yet, but by the sounds of it, I still think the cleaner way to go is to make Solr have an option to specifically pass in which component to run and turn off all others. This would be useful for other things, too. Then you could just use the existing mechanisms.
          Hide
          James Dyer added a comment -

          Third version (with ".patch" extension. I had used ".txt" extension with 2nd version). Works with trunk rev#986945.

          This time SpellCheckCollator calls the SearchHandler instead of calling the QueryComponent. This required exposing a reference to the SearchHandler on the ResponseBuilder. Also a new overloaded method in SearchHandler.processRequestBody() lets you override the list of components to run. In this case we just have it run QueryComponent.

          This revision has 2 potential benefits:

          (1) the overloaded method in SearchHandler may prove useful to other components in the future.

          (2) there may be a way to get SearchHandler to requery all the shards at once and then there would be no need to reintegrate the Collations in SearchHandler.finishStage(). However, see my comment in SpellCheckCollator lines 56-57. Likely I am calling SpellCheckCollator during the wrong "stage" of the distributed request but I a need to find out more specifically how shards work to determine how to further improve this here. As time allows I will do my own investigating but anyone's advice would be greatly appreciated.

          Finally, this version corrects a bug that would have caused one of the test scenarios in DistributedSpellCheckComponentTest to fail. Unfortunately in the 2nd version, I had left some scenarios commented-out and did not catch this until now.

          Show
          James Dyer added a comment - Third version (with ".patch" extension. I had used ".txt" extension with 2nd version). Works with trunk rev#986945. This time SpellCheckCollator calls the SearchHandler instead of calling the QueryComponent. This required exposing a reference to the SearchHandler on the ResponseBuilder. Also a new overloaded method in SearchHandler.processRequestBody() lets you override the list of components to run. In this case we just have it run QueryComponent. This revision has 2 potential benefits: (1) the overloaded method in SearchHandler may prove useful to other components in the future. (2) there may be a way to get SearchHandler to requery all the shards at once and then there would be no need to reintegrate the Collations in SearchHandler.finishStage(). However, see my comment in SpellCheckCollator lines 56-57. Likely I am calling SpellCheckCollator during the wrong "stage" of the distributed request but I a need to find out more specifically how shards work to determine how to further improve this here. As time allows I will do my own investigating but anyone's advice would be greatly appreciated. Finally, this version corrects a bug that would have caused one of the test scenarios in DistributedSpellCheckComponentTest to fail. Unfortunately in the 2nd version, I had left some scenarios commented-out and did not catch this until now.
          Hide
          James Dyer added a comment -

          New Patch Version with Shard Support. Grant, I hope I'm getting closer to what you have in mind this time around.

          I think I've figured how to send the collation test queries back to SearchHandler and have it take care of querying the shards individually. Then the collation logic is no different for distributed / non-distributed.

          As I would like to eventually use this in production here, any comments as to how to further make this a "production-quality" feature are much appreciated.

          Show
          James Dyer added a comment - New Patch Version with Shard Support. Grant, I hope I'm getting closer to what you have in mind this time around. I think I've figured how to send the collation test queries back to SearchHandler and have it take care of querying the shards individually. Then the collation logic is no different for distributed / non-distributed. As I would like to eventually use this in production here, any comments as to how to further make this a "production-quality" feature are much appreciated.
          Hide
          Grant Ingersoll added a comment -

          Hi James,

          First off, good work. I like the overall design, etc.

          Second, this patch no longer applies cleanly to trunk. The issue is in the SearchHandler.

          Third, in thinking some more about the whole distributed case, perhaps we are approaching this wrong. I was originally thinking that we would have to go off and re-query all the shards (as in send another message) but we really shouldn't have to do that, right? Why can't we just pass the collation request through to the shards as part of the get suggestions and then it can, if collation is asked for, return it's collation suggestions. Then, the question becomes how to merge the suggestions and pick the best one. This should save a round trip at the cost of doing some extra collations, but since most people aren't going to ask for more than 5 or 10, it shouldn't be an issue.

          -Grant

          Show
          Grant Ingersoll added a comment - Hi James, First off, good work. I like the overall design, etc. Second, this patch no longer applies cleanly to trunk. The issue is in the SearchHandler. Third, in thinking some more about the whole distributed case, perhaps we are approaching this wrong. I was originally thinking that we would have to go off and re-query all the shards (as in send another message) but we really shouldn't have to do that, right? Why can't we just pass the collation request through to the shards as part of the get suggestions and then it can, if collation is asked for, return it's collation suggestions. Then, the question becomes how to merge the suggestions and pick the best one. This should save a round trip at the cost of doing some extra collations, but since most people aren't going to ask for more than 5 or 10, it shouldn't be an issue. -Grant
          Hide
          James Dyer added a comment - - edited

          Two new versions of the patch:

          1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538. In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries. This is simpler code but requires many more round-trips between shards. We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.

          2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements. This version also applies cleanly to trunk revision #993538. In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations. The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum. This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder. It may be possible to find cases where a Distributed setup may return different collations-or the same collations in a different order-than a non-distributed setup. I do not believe this potential disparity would ever be very significant.

          Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19. Version 2 is more like what you describe in your comment from 8/30. Let me know if you think this needs any more tweaking. ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2083 also. Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken. (If I'm right), we may want to fix it before adding on more functionality.

          Show
          James Dyer added a comment - - edited Two new versions of the patch: 1. SOLR-2010 _shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538. In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries. This is simpler code but requires many more round-trips between shards. We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup. 2. SOLR-2010 _shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements. This version also applies cleanly to trunk revision #993538. In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations. The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum. This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder. It may be possible to find cases where a Distributed setup may return different collations- or the same collations in a different order -than a non-distributed setup. I do not believe this potential disparity would ever be very significant. Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19. Version 2 is more like what you describe in your comment from 8/30. Let me know if you think this needs any more tweaking. ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2083 also. Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken. (If I'm right), we may want to fix it before adding on more functionality.
          Hide
          James Dyer added a comment -

          Both patch versions sync'ed to Trunk version 999521. (sorry about the many filename variants)

          Show
          James Dyer added a comment - Both patch versions sync'ed to Trunk version 999521. (sorry about the many filename variants)
          Hide
          James Dyer added a comment -

          This version is for v1.4.1. No shard support as SpellCheckComponent does not have any distributed support in 1.4. All tests pass.

          Show
          James Dyer added a comment - This version is for v1.4.1. No shard support as SpellCheckComponent does not have any distributed support in 1.4. All tests pass.
          Hide
          Grant Ingersoll added a comment -

          I haven't forgotten about this, James. I appreciate the hard work, but I'm swamped at the moment. I'll try to get you feedback soon.

          Show
          Grant Ingersoll added a comment - I haven't forgotten about this, James. I appreciate the hard work, but I'm swamped at the moment. I'll try to get you feedback soon.
          Hide
          JAYABAALAN V added a comment -

          I am using Apache Solr 1.4 and need to download this patch for my implementation purpose.Let me know the latest rev#

          Show
          JAYABAALAN V added a comment - I am using Apache Solr 1.4 and need to download this patch for my implementation purpose.Let me know the latest rev#
          Hide
          James Dyer added a comment -

          The patch file SOLR-2010_141.patch should apply cleanly to v1.4.1.

          Show
          James Dyer added a comment - The patch file SOLR-2010 _141.patch should apply cleanly to v1.4.1.
          Hide
          Grant Ingersoll added a comment -

          James,

          For the two diff. approaches, did you do any testing to get a sense of which performs better? It seems to me that the recombine one, while overfetching some, would likely be faster overall b/c it avoids all the extra shard communication. Of course, it may be the case that for some setups, one works better than the other. i.e. small sharded systems can afford the second call, while large systems should avoid the second fan-out/in. Which, of course, makes me wonder how hard it would be to give people both and let them specify based on an input parameter or by having two different components derived off of SpellCheckComponent? Thoughts? In that approach, we could have SCC be just for single node instances and then the other two inherit from it to provide users the choice of distributed approaches. Since you have the code for both already, what do you think?

          Otherwise, I've looked at the recombine approach and it seems pretty solid from a "ready to commit" standpoint.

          Show
          Grant Ingersoll added a comment - James, For the two diff. approaches, did you do any testing to get a sense of which performs better? It seems to me that the recombine one, while overfetching some, would likely be faster overall b/c it avoids all the extra shard communication. Of course, it may be the case that for some setups, one works better than the other. i.e. small sharded systems can afford the second call, while large systems should avoid the second fan-out/in. Which, of course, makes me wonder how hard it would be to give people both and let them specify based on an input parameter or by having two different components derived off of SpellCheckComponent? Thoughts? In that approach, we could have SCC be just for single node instances and then the other two inherit from it to provide users the choice of distributed approaches. Since you have the code for both already, what do you think? Otherwise, I've looked at the recombine approach and it seems pretty solid from a "ready to commit" standpoint.
          Hide
          James Dyer added a comment -

          Grant,

          It wouldn't be difficult to create an uber-patch that allows users to pick which way to go. If that's the route you want to go then I'd be happy to do that. However, I think it would be best to stick with the "recombine" approach because although you'll get throw-away collations, it will always be done internally within the shard. The performance penalty in most cases will be slight. On the other hand, if using the "Search Handler" approach, it has to query over the network for each try, which could be significant. I wouldn't say that you would never benefit from the "Search Handler" option, but I wonder if it warrants extra lines of code and making changes to the SearchHandler class, etc.

          Unfortunately I haven't done any performance testing with these. We only are in early development here with SOLR and I don't have access to multiple servers with which I can easily deploy such a test. On a non-distributed setup this patch only adds a little bit of overhead, and I wouldn't expect the "recombine" option to be much worse than that.

          Note that with either approach I'd imagine you'd frequently run into the case where some/many shards simply do not have the documents the user is looking for and they will have to query up to "collationMaxTries" to come up empty. In which case the shard(s) that get the results may need to wait for the shards that are busy querying away in vain...

          Let me know if you want an "uber-patch". I might have a little time later today if you let me know.

          Show
          James Dyer added a comment - Grant, It wouldn't be difficult to create an uber-patch that allows users to pick which way to go. If that's the route you want to go then I'd be happy to do that. However, I think it would be best to stick with the "recombine" approach because although you'll get throw-away collations, it will always be done internally within the shard. The performance penalty in most cases will be slight. On the other hand, if using the "Search Handler" approach, it has to query over the network for each try , which could be significant. I wouldn't say that you would never benefit from the "Search Handler" option, but I wonder if it warrants extra lines of code and making changes to the SearchHandler class, etc. Unfortunately I haven't done any performance testing with these. We only are in early development here with SOLR and I don't have access to multiple servers with which I can easily deploy such a test. On a non-distributed setup this patch only adds a little bit of overhead, and I wouldn't expect the "recombine" option to be much worse than that. Note that with either approach I'd imagine you'd frequently run into the case where some/many shards simply do not have the documents the user is looking for and they will have to query up to "collationMaxTries" to come up empty. In which case the shard(s) that get the results may need to wait for the shards that are busy querying away in vain... Let me know if you want an "uber-patch". I might have a little time later today if you let me know.
          Hide
          Grant Ingersoll added a comment -

          Makes sense. I'd say we stick w/ the recombine approach.

          Show
          Grant Ingersoll added a comment - Makes sense. I'd say we stick w/ the recombine approach.
          Hide
          JAYABAALAN V added a comment -

          Let me know the correct path for downloding the patch for SpellChecking based on your discuss.It very discult for the identify the correct patch

          I am try to download things from the following truck

          https://svn.apache.org/repos/asf/lucene/dev/trunk/

          But the new modified code are not present in the this truck for download.do provide any pointer or clear steps for download this patch for spellchecking.

          Show
          JAYABAALAN V added a comment - Let me know the correct path for downloding the patch for SpellChecking based on your discuss.It very discult for the identify the correct patch I am try to download things from the following truck https://svn.apache.org/repos/asf/lucene/dev/trunk/ But the new modified code are not present in the this truck for download.do provide any pointer or clear steps for download this patch for spellchecking.
          Hide
          James Dyer added a comment -

          If you want to use v1.4.1 you can either get the GA source and then apply the patch or check out the 1.4.x branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 then apply the patch file (attached to this JIRA). The correct patch for 1.4.1 is: SOLR-2010_141.patch.

          If you want to use the Trunk Version then check out the source at http://svn.apache.org/repos/asf/lucene/dev/trunk. The correct patch file for Trunk is: SOLR-2010_shardRecombineCollations_999521.patch

          There are instructions on applying patches in the wiki: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

          Show
          James Dyer added a comment - If you want to use v1.4.1 you can either get the GA source and then apply the patch or check out the 1.4.x branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 then apply the patch file (attached to this JIRA). The correct patch for 1.4.1 is: SOLR-2010 _141.patch. If you want to use the Trunk Version then check out the source at http://svn.apache.org/repos/asf/lucene/dev/trunk . The correct patch file for Trunk is: SOLR-2010 _shardRecombineCollations_999521.patch There are instructions on applying patches in the wiki: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
          Hide
          JAYABAALAN V added a comment - - edited

          Thanks for your direction.

          Based on your input i have tried in the truck and used the SOLR-2010_shardRecombineCollations_999521.patch for download.

          http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path

          But there is a problem in the SpellingParams.java under this version.It looks not updated correctly in this version.Mainly three final string values like ""maxCollations","maxCollationTries", and collateExtendedResults are implemented and it Solr v1.3 in the history.

          Do let me know the updated version path for downloading.

          Show
          JAYABAALAN V added a comment - - edited Thanks for your direction. Based on your input i have tried in the truck and used the SOLR-2010 _shardRecombineCollations_999521.patch for download. http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path But there is a problem in the SpellingParams.java under this version.It looks not updated correctly in this version.Mainly three final string values like ""maxCollations","maxCollationTries", and collateExtendedResults are implemented and it Solr v1.3 in the history. Do let me know the updated version path for downloading.
          Hide
          James Dyer added a comment -

          I tested "SOLR-2010_shardRecombineCollations_999521.patch" with current trunk and it still applies cleanly. I'm not sure why SpellingParams.java isn't updating for you. Perhaps try again?

          James Dyer
          E-Commerce Systems
          Ingram Content Group
          (615) 213-4311

          Show
          James Dyer added a comment - I tested " SOLR-2010 _shardRecombineCollations_999521.patch" with current trunk and it still applies cleanly. I'm not sure why SpellingParams.java isn't updating for you. Perhaps try again? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
          Hide
          JAYABAALAN V added a comment -

          What I mean here is the correct version of code is not updated the mainly three final string method are not updated under this path

          http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path

          that reason i am asking correct steps or correct source path for download purpose

          Can you provide the details on Scheme and SolrConfiguration side also.It would be better i think so.

          Show
          JAYABAALAN V added a comment - What I mean here is the correct version of code is not updated the mainly three final string method are not updated under this path http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path that reason i am asking correct steps or correct source path for download purpose Can you provide the details on Scheme and SolrConfiguration side also.It would be better i think so.
          Hide
          James Dyer added a comment -

          Grant,

          Is there anything else you'd like me to do with this? Is this something you think should be committed?

          Show
          James Dyer added a comment - Grant, Is there anything else you'd like me to do with this? Is this something you think should be committed?
          Hide
          Grant Ingersoll added a comment -

          Committed to trunk on revision 1021439. Working on backporting to 3.x

          James, can you add docs to the wiki for the new parameters?

          Show
          Grant Ingersoll added a comment - Committed to trunk on revision 1021439. Working on backporting to 3.x James, can you add docs to the wiki for the new parameters?
          Hide
          James Dyer added a comment -

          Wiki is updated. I marked it all as available in 3.1 / 4.0 . correct?

          Show
          James Dyer added a comment - Wiki is updated. I marked it all as available in 3.1 / 4.0 . correct?
          Hide
          JAYABAALAN V added a comment -

          I am able to download only these four java class under the revision 1021439 SpellCheckComponent,SpellCheckResponse,SpellingParams,TestSpellCheckResponse and other java class are not updated ResponseBuilder.java, and SearchHandler.java

          Let me know the correct path for these two java classes for revision 1021439

          Show
          JAYABAALAN V added a comment - I am able to download only these four java class under the revision 1021439 SpellCheckComponent,SpellCheckResponse,SpellingParams,TestSpellCheckResponse and other java class are not updated ResponseBuilder.java, and SearchHandler.java Let me know the correct path for these two java classes for revision 1021439
          Hide
          Yonik Seeley added a comment -

          This patch introduced some resource leaks - whenever it tries to verify the collated result by doing a search, it never closes the request and hence the searcher is never closed.

          Show
          Yonik Seeley added a comment - This patch introduced some resource leaks - whenever it tries to verify the collated result by doing a search, it never closes the request and hence the searcher is never closed.
          Hide
          Yonik Seeley added a comment -

          fyi, I just committed a fix for the resource leak, in addition to a couple other simplifications.

          Show
          Yonik Seeley added a comment - fyi, I just committed a fix for the resource leak, in addition to a couple other simplifications.
          Hide
          Yonik Seeley added a comment -

          Is this the patch that added multiple collations?

          I'm seeing stuff like this:

            {...
                "collation":"lowerfilt:(+faith +hope +loaves)",
                "collation":"lowerfilt:(+faith +hope +love)"}}}
          

          And we should probably change collation to be an array instead. Repeated keys in JSON are legal ,but not nice to deal with.

          Show
          Yonik Seeley added a comment - Is this the patch that added multiple collations? I'm seeing stuff like this: {... "collation" : "lowerfilt:(+faith +hope +loaves)" , "collation" : "lowerfilt:(+faith +hope +love)" }}} And we should probably change collation to be an array instead. Repeated keys in JSON are legal ,but not nice to deal with.
          Hide
          James Dyer added a comment -

          Here is a patch for the 3.x branch. This includes Yonik's fix to close the searcher (thanks!). All tests pass.

          Grant, do you feel this is something that can safely go into the 3.x branch in addition to Trunk?

          (by the way, I am looking into Yonik's suggestion to change multiple collation results into an Array. The trick here, I think, is to not break backwards-compatibility...)

          Show
          James Dyer added a comment - Here is a patch for the 3.x branch. This includes Yonik's fix to close the searcher (thanks!). All tests pass. Grant, do you feel this is something that can safely go into the 3.x branch in addition to Trunk? (by the way, I am looking into Yonik's suggestion to change multiple collation results into an Array. The trick here, I think, is to not break backwards-compatibility...)
          Hide
          Grant Ingersoll added a comment -

          I already did the merge to 3.x. I don't believe Yonik has backported his fix yet.

          Show
          Grant Ingersoll added a comment - I already did the merge to 3.x. I don't believe Yonik has backported his fix yet.
          Hide
          James Dyer added a comment -

          Maybe I'm looking at the wrong place. I checked out the 3.x branch at: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x . Is that correct?

          If you drill into ../solr/src/java/org/apache/solr/spelling/ from there you won't find the added source files from this case (SpellCheckCollator.java, etc...)

          Show
          James Dyer added a comment - Maybe I'm looking at the wrong place. I checked out the 3.x branch at: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x . Is that correct? If you drill into ../solr/src/java/org/apache/solr/spelling/ from there you won't find the added source files from this case (SpellCheckCollator.java, etc...)
          Hide
          Grant Ingersoll added a comment -

          James, you are right. I mislabeled my merge. Still getting used to this merge from trunk to branch stuff. At any rate, no need for a patch, I will get the merged figured out soon.

          Show
          Grant Ingersoll added a comment - James, you are right. I mislabeled my merge. Still getting used to this merge from trunk to branch stuff. At any rate, no need for a patch, I will get the merged figured out soon.
          Hide
          James Dyer added a comment -

          update the 1.4.1 patch to include Yonik's fix.

          Show
          James Dyer added a comment - update the 1.4.1 patch to include Yonik's fix.
          Hide
          James Dyer added a comment -

          Here's an attempt to implement Yonik's suggestion to have multiple collations returned as an Array rather than use repeated keys. I am not familiar with JSON so I didn't realize the original format would cause problems.

          From this perspective, however, I like the original version better. The problem is in order to maintain backwards-compatibility, if "spellcheck.maxCollations" is unset or set to "1", then we need to return a single String with key "collation". This patch alters the response only if "spellcheck.maxCollations" is >1, instead returning an array with key "collations".

          I also changed the distributed code and solrj to cope with the change in format. All tests pass, but maybe someone will find a better solution than this, or perhaps we can leave it as is.

          Show
          James Dyer added a comment - Here's an attempt to implement Yonik's suggestion to have multiple collations returned as an Array rather than use repeated keys. I am not familiar with JSON so I didn't realize the original format would cause problems. From this perspective, however, I like the original version better. The problem is in order to maintain backwards-compatibility, if "spellcheck.maxCollations" is unset or set to "1", then we need to return a single String with key "collation". This patch alters the response only if "spellcheck.maxCollations" is >1, instead returning an array with key "collations". I also changed the distributed code and solrj to cope with the change in format. All tests pass, but maybe someone will find a better solution than this, or perhaps we can leave it as is.
          Hide
          Grant Ingersoll added a comment -

          James, I did the merge back to 3.x. I don't think we will be backporting this to 1.4, since all future releases there are bug-fix only.

          Show
          Grant Ingersoll added a comment - James, I did the merge back to 3.x. I don't think we will be backporting this to 1.4, since all future releases there are bug-fix only.
          Hide
          James Dyer added a comment -

          Great. Thank you.

          The 1.4 patch is mostly for my benefit, so we can use the functionaltiy before the next release. Thought I'd share that with anyone else who wants to try it too...

          Show
          James Dyer added a comment - Great. Thank you. The 1.4 patch is mostly for my benefit, so we can use the functionaltiy before the next release. Thought I'd share that with anyone else who wants to try it too...
          Hide
          Yonik Seeley added a comment - - edited

          James, I did the merge back to 3.x.

          FYI, you missed Robert's resource leak fixes to SpellCheckCollatorTest.
          Not sure what best practice is to catch stuff like this... if it's only a file or two, I guess check the history of each?

          edit: actually your backport to 3x didn't even touch SpellCheckCollatorTest. I was misled by the fact that when you look at the history of SpellCheckCollatorTest, it shows an update. But I guess it was just merge properties. Ugh.

          yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
          $ svn log ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java
          ------------------------------------------------------------------------
          r1026000 | gsingers | 2010-10-21 09:48:34 -0400 (Thu, 21 Oct 2010) | 1 line
          
          SOLR-2010, including Yonik's fix, SOLR-2181 -- hope I did this merge correctly
          ------------------------------------------------------------------------
          r1021439 | gsingers | 2010-10-11 13:32:11 -0400 (Mon, 11 Oct 2010) | 1 line
          
          SOLR-2010: added richer support for spell checking collations
          ------------------------------------------------------------------------
          
          yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
          $ svn diff -r 1021439:1026000 ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java                                                   
          yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
          

          I'm in the process of getting branch_3x to pass the searcher open/close test, so I'll handle this.

          Show
          Yonik Seeley added a comment - - edited James, I did the merge back to 3.x. FYI, you missed Robert's resource leak fixes to SpellCheckCollatorTest. Not sure what best practice is to catch stuff like this... if it's only a file or two, I guess check the history of each? edit: actually your backport to 3x didn't even touch SpellCheckCollatorTest. I was misled by the fact that when you look at the history of SpellCheckCollatorTest, it shows an update. But I guess it was just merge properties. Ugh. yonik@WOLVERINE /cygdrive/c/code/lusolr_3x $ svn log ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java ------------------------------------------------------------------------ r1026000 | gsingers | 2010-10-21 09:48:34 -0400 (Thu, 21 Oct 2010) | 1 line SOLR-2010, including Yonik's fix, SOLR-2181 -- hope I did this merge correctly ------------------------------------------------------------------------ r1021439 | gsingers | 2010-10-11 13:32:11 -0400 (Mon, 11 Oct 2010) | 1 line SOLR-2010: added richer support for spell checking collations ------------------------------------------------------------------------ yonik@WOLVERINE /cygdrive/c/code/lusolr_3x $ svn diff -r 1021439:1026000 ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java yonik@WOLVERINE /cygdrive/c/code/lusolr_3x I'm in the process of getting branch_3x to pass the searcher open/close test, so I'll handle this.
          Hide
          Yonik Seeley added a comment -

          Still stuff messed up with the merge props I guess - when I try to merge in Robert's fixes, it does nothing (it thinks they are already merged).
          I guess I just need to copy the file at this point.

          Show
          Yonik Seeley added a comment - Still stuff messed up with the merge props I guess - when I try to merge in Robert's fixes, it does nothing (it thinks they are already merged). I guess I just need to copy the file at this point.
          Hide
          Robert Muir added a comment -

          Uh oh, what did I do here... the issue is a bit long, how can I help with merging?

          Show
          Robert Muir added a comment - Uh oh, what did I do here... the issue is a bit long, how can I help with merging?
          Hide
          James Dyer added a comment -

          Robert, your changes to fix resource leak in SpellCheckCollatorTest was merged into 3.x by Yonik in r1026921. I believe this issue should have been long closed as it was committed to Trunk and 3.x last year.

          Show
          James Dyer added a comment - Robert, your changes to fix resource leak in SpellCheckCollatorTest was merged into 3.x by Yonik in r1026921. I believe this issue should have been long closed as it was committed to Trunk and 3.x last year.
          Hide
          Robert Muir added a comment -

          Thanks James!

          Show
          Robert Muir added a comment - Thanks James!
          Hide
          Uwe Schindler added a comment -

          Bulk close after release of 3.1

          Show
          Uwe Schindler added a comment - Bulk close after release of 3.1

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              James Dyer
            • Votes:
              5 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development