Solr
  1. Solr
  2. SOLR-788

MoreLikeThis should support distributed search

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 5.0
    • Component/s: MoreLikeThis
    • Labels:
      None

      Description

      The MoreLikeThis component should support distributed processing.

      See SOLR-303.

      1. SolrMoreLikeThisPatch.txt
        12 kB
        Matthew Woytowitz
      2. SOLR-788.patch
        41 kB
        Mark Miller
      3. SOLR-788.patch
        24 kB
        Mark Miller
      4. MoreLikeThisComponentTest.patch
        9 kB
        Matthew Woytowitz
      5. MLT.patch
        13 kB
        Jamie Johnson
      6. MLT.patch
        12 kB
        Jamie Johnson
      7. AlternateDistributedMLT.patch
        15 kB
        Mike Anderson

        Issue Links

          Activity

          Hide
          Matthew Woytowitz added a comment -

          This patch adds support for moreLikeThis to the distributed search.

          Show
          Matthew Woytowitz added a comment - This patch adds support for moreLikeThis to the distributed search.
          Hide
          Mark Miller added a comment -

          Great Matthew! Very happy to see these distributed components issues moving forward. Any chance you can attach a couple junit tests with these patches?

          Show
          Mark Miller added a comment - Great Matthew! Very happy to see these distributed components issues moving forward. Any chance you can attach a couple junit tests with these patches?
          Hide
          Matthew Woytowitz added a comment -

          I'll take a look at the existing Unit tests.

          I'm not sure what type of test coverage my unit tests will have, considering this is distributed (multi-computer) and mult-threaded code.

          Show
          Matthew Woytowitz added a comment - I'll take a look at the existing Unit tests. I'm not sure what type of test coverage my unit tests will have, considering this is distributed (multi-computer) and mult-threaded code.
          Hide
          Mark Miller added a comment -

          There are some distributed tests already that you can build from. Take a peak in the test package.

          Show
          Mark Miller added a comment - There are some distributed tests already that you can build from. Take a peak in the test package.
          Hide
          Matthew Woytowitz added a comment -

          Add some test cases, by no means complete.

          Description of Changes made for moreLikeThisComponent:

          Added a new purpose to ShardRequest
          This for new shard requests to execute boolean query that is now returned when isShard is true from process.

          Added method HandleResponse.
          Creates shard request for each element return in moreLikeThis during EXEC_QUERY stage. Every shard executes the MoreLikeThis query to find the best matches for a given document.

          Added finishedStage

          Checks if stage is GET_FIELDS and MLT == true. Loops through every shard reponse and finds those with the new MLT_RESULTS purpose and adds them to the response after they are sorted and trimmed for length and verified they are in the response.

          Show
          Matthew Woytowitz added a comment - Add some test cases, by no means complete. Description of Changes made for moreLikeThisComponent: Added a new purpose to ShardRequest This for new shard requests to execute boolean query that is now returned when isShard is true from process. Added method HandleResponse. Creates shard request for each element return in moreLikeThis during EXEC_QUERY stage. Every shard executes the MoreLikeThis query to find the best matches for a given document. Added finishedStage Checks if stage is GET_FIELDS and MLT == true. Loops through every shard reponse and finds those with the new MLT_RESULTS purpose and adds them to the response after they are sorted and trimmed for length and verified they are in the response.
          Hide
          Matthew Woytowitz added a comment -

          MoreLikeThisComponentTest.patch should have been marked. accept apache license. sorry

          Show
          Matthew Woytowitz added a comment - MoreLikeThisComponentTest.patch should have been marked. accept apache license. sorry
          Hide
          Mike Anderson added a comment -

          What release of SOLR should one apply this patch to?

          (I tried an older build of 1.4 and got
          patching file org/apache/solr/handler/MoreLikeThisHandler.java
          patching file org/apache/solr/handler/component/MoreLikeThisComponent.java
          Hunk #2 FAILED at 51.
          1 out of 2 hunks FAILED – saving rejects to file org/apache/solr/handler/component/MoreLikeThisComponent.java.rej
          patching file org/apache/solr/handler/component/ShardRequest.java
          )

          Show
          Mike Anderson added a comment - What release of SOLR should one apply this patch to? (I tried an older build of 1.4 and got patching file org/apache/solr/handler/MoreLikeThisHandler.java patching file org/apache/solr/handler/component/MoreLikeThisComponent.java Hunk #2 FAILED at 51. 1 out of 2 hunks FAILED – saving rejects to file org/apache/solr/handler/component/MoreLikeThisComponent.java.rej patching file org/apache/solr/handler/component/ShardRequest.java )
          Hide
          Matthew Woytowitz added a comment -

          It's on the top of the patch. 772437

          — org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
          +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)

          Matt Woytowitz
          Software Enginneer
          ManTech International Corporation
          Phone: (703) 674-3674
          Email: matthew.woytowitz@mantech.com

          Show
          Matthew Woytowitz added a comment - It's on the top of the patch. 772437 — org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437) +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy) Matt Woytowitz Software Enginneer ManTech International Corporation Phone: (703) 674-3674 Email: matthew.woytowitz@mantech.com
          Hide
          Mike Anderson added a comment - - edited

          Yep, I got that part figured out finally. Unfortunately I'm getting back 0 results when I pass the shards parameter, as opposed to when it is omited.

          http://localhost:8983/solr/select?q=graph&mlt=true&mlt.fl=title&mlt.mindf=1&mlt.mintf=1&fl=id,score,title&shards=localhost:8983/solr

          returns:
          <lst name="moreLikeThis">
          <result name="018639b9dfd5003c20c3ceb29df9d582" numFound="0" start="0" maxScore="0.0"/>
          <result name="83de9bc1953e36e44df8e95661983183" numFound="0" start="0" maxScore="0.0"/>
          </lst>

          where as

          http://localhost:8983/solr/select?q=graph&mlt=true&mlt.fl=title&mlt.mindf=1&mlt.mintf=1&fl=id,score,title

          returns

          <lst name="moreLikeThis">
          <result name="018639b9dfd5003c20c3ceb29df9d582" numFound="1198" start="0" maxScore="3.3357687"/>
          ...result docs
          <result name="83de9bc1953e36e44df8e95661983183" numFound="487" start="0" maxScore="4.129801"/>
          ...result docs
          </lst>

          However, perhaps more pressing is that when the shards param is set my spellCheck component stops responding (I had to apply the distributed spellcheck patch as well). yikes...

          I poked around in the code, but couldn't really make any progress.. Any help would be greatly appreciated.

          -mike

          Show
          Mike Anderson added a comment - - edited Yep, I got that part figured out finally. Unfortunately I'm getting back 0 results when I pass the shards parameter, as opposed to when it is omited. http://localhost:8983/solr/select?q=graph&mlt=true&mlt.fl=title&mlt.mindf=1&mlt.mintf=1&fl=id,score,title&shards=localhost:8983/solr returns: <lst name="moreLikeThis"> <result name="018639b9dfd5003c20c3ceb29df9d582" numFound="0" start="0" maxScore="0.0"/> <result name="83de9bc1953e36e44df8e95661983183" numFound="0" start="0" maxScore="0.0"/> </lst> where as http://localhost:8983/solr/select?q=graph&mlt=true&mlt.fl=title&mlt.mindf=1&mlt.mintf=1&fl=id,score,title returns <lst name="moreLikeThis"> <result name="018639b9dfd5003c20c3ceb29df9d582" numFound="1198" start="0" maxScore="3.3357687"/> ...result docs <result name="83de9bc1953e36e44df8e95661983183" numFound="487" start="0" maxScore="4.129801"/> ...result docs </lst> However, perhaps more pressing is that when the shards param is set my spellCheck component stops responding (I had to apply the distributed spellcheck patch as well). yikes... I poked around in the code, but couldn't really make any progress.. Any help would be greatly appreciated. -mike
          Hide
          Matthew Woytowitz added a comment -

          It's been 3 months since I looked at this. Sounds fimiliar. Here are the params I pass with every MLT Query.

          private int minTermFrequency = MoreLikeThis.DEFAULT_MIN_TERM_FREQ;
          private int minWordLength = MoreLikeThis.DEFAULT_MIN_WORD_LENGTH;
          private int maxWordLength = MoreLikeThis.DEFAULT_MAX_WORD_LENGTH;
          private int maxQueryTerms = MoreLikeThis.DEFAULT_MAX_QUERY_TERMS;
          private int minDocFreq = MoreLikeThis.DEFAULT_MIN_DOC_FREQ;
          private int maxTokensToParse = MoreLikeThis.DEFAULT_MAX_NUM_TOKENS_PARSED;

          ....

          params.add(MoreLikeThisParams.MLT, Boolean.TRUE.toString());
          params.add(MoreLikeThisParams.SIMILARITY_FIELDS, similarFields.split(","));
          params.add(MoreLikeThisParams.MIN_TERM_FREQ, minTermFrequency + "");
          params.add(MoreLikeThisParams.MIN_WORD_LEN, minWordLength + "");
          params.add(MoreLikeThisParams.MAX_WORD_LEN, maxWordLength + "");
          params.add(MoreLikeThisParams.MAX_QUERY_TERMS, maxQueryTerms + "");
          params.add(MoreLikeThisParams.MAX_NUM_TOKENS_PARSED, maxTokensToParse + "");
          params.add(MoreLikeThisParams.MIN_DOC_FREQ, minDocFreq + "");

          Are you using a stock solr config? Can you send me the solr config and schema.xml?

          Are you logging the incoming queries to solr?
          You should see three requests. Your request, the shard request to get scores and ids and finally a request to return the fields you requested for the best matches.

          What does the second query look like? Take a look at that in your browser.
          If you run that query what do your results look like?

          Matt Woytowitz
          Software Enginneer
          ManTech International Corporation
          Phone: (703) 674-3674
          Email: matthew.woytowitz@mantech.com

          Show
          Matthew Woytowitz added a comment - It's been 3 months since I looked at this. Sounds fimiliar. Here are the params I pass with every MLT Query. private int minTermFrequency = MoreLikeThis.DEFAULT_MIN_TERM_FREQ; private int minWordLength = MoreLikeThis.DEFAULT_MIN_WORD_LENGTH; private int maxWordLength = MoreLikeThis.DEFAULT_MAX_WORD_LENGTH; private int maxQueryTerms = MoreLikeThis.DEFAULT_MAX_QUERY_TERMS; private int minDocFreq = MoreLikeThis.DEFAULT_MIN_DOC_FREQ; private int maxTokensToParse = MoreLikeThis.DEFAULT_MAX_NUM_TOKENS_PARSED; .... params.add(MoreLikeThisParams.MLT, Boolean.TRUE.toString()); params.add(MoreLikeThisParams.SIMILARITY_FIELDS, similarFields.split(",")); params.add(MoreLikeThisParams.MIN_TERM_FREQ, minTermFrequency + ""); params.add(MoreLikeThisParams.MIN_WORD_LEN, minWordLength + ""); params.add(MoreLikeThisParams.MAX_WORD_LEN, maxWordLength + ""); params.add(MoreLikeThisParams.MAX_QUERY_TERMS, maxQueryTerms + ""); params.add(MoreLikeThisParams.MAX_NUM_TOKENS_PARSED, maxTokensToParse + ""); params.add(MoreLikeThisParams.MIN_DOC_FREQ, minDocFreq + ""); Are you using a stock solr config? Can you send me the solr config and schema.xml? Are you logging the incoming queries to solr? You should see three requests. Your request, the shard request to get scores and ids and finally a request to return the fields you requested for the best matches. What does the second query look like? Take a look at that in your browser. If you run that query what do your results look like? Matt Woytowitz Software Enginneer ManTech International Corporation Phone: (703) 674-3674 Email: matthew.woytowitz@mantech.com
          Hide
          Mike Anderson added a comment - - edited

          I had trouble getting this to work in my distributed setup so I changed the patch around (for better or worse) to make it flow in a way that made sense to me.

          Just wanted to post in case anybody else was having trouble.

          Some thoughts on response builder/ distributed components: It would be nice to allow components to add requests (in a natural way) to response builder after the QueryComponent has made it through finishedStage and merged all the results. This could optimize MLT so that instead of finding MLT for the top 5 hits from each shard, we find MLT for the top 5 hits overall. (maybe there's a way to do this, but I couldn't really find the intuition behind it) .

          (attached patch is a modified version of Matt's)

          mike
          mike_a@mit.edu

          Show
          Mike Anderson added a comment - - edited I had trouble getting this to work in my distributed setup so I changed the patch around (for better or worse) to make it flow in a way that made sense to me. Just wanted to post in case anybody else was having trouble. Some thoughts on response builder/ distributed components: It would be nice to allow components to add requests (in a natural way) to response builder after the QueryComponent has made it through finishedStage and merged all the results. This could optimize MLT so that instead of finding MLT for the top 5 hits from each shard, we find MLT for the top 5 hits overall. (maybe there's a way to do this, but I couldn't really find the intuition behind it) . (attached patch is a modified version of Matt's) mike mike_a@mit.edu
          Hide
          Shawn Heisey added a comment -

          I couldn't get the original patch to work on the 4.0 trunk or branch_3x. It would apply, but not compile.

          I did get the alternate patch to apply and compile with the branch_3x version downloaded last night. Part of that was changing the constant for the purpose to 0x800 since a different one with 0x400 had been added already.

          When I add a shards parameter, it no longer works and says "undefined field id" twice and spits out "request: " with the URL of the shard.

          Have things changed enough in the last several months that this patch will require reworking, or did I just miss something simple? If you need info from me, let me know how to get it.

          Show
          Shawn Heisey added a comment - I couldn't get the original patch to work on the 4.0 trunk or branch_3x. It would apply, but not compile. I did get the alternate patch to apply and compile with the branch_3x version downloaded last night. Part of that was changing the constant for the purpose to 0x800 since a different one with 0x400 had been added already. When I add a shards parameter, it no longer works and says "undefined field id" twice and spits out "request: " with the URL of the shard. Have things changed enough in the last several months that this patch will require reworking, or did I just miss something simple? If you need info from me, let me know how to get it.
          Hide
          Matthew Woytowitz added a comment -

          In the tail end of development cycle and won't have time to look at it till end of month.

          Patch is a year old at this point. I think the patch has a revision number on it. I would try to checking out from SVN that revision, then patch, then update.

          Hope that helps,

          Matt

          Show
          Matthew Woytowitz added a comment - In the tail end of development cycle and won't have time to look at it till end of month. Patch is a year old at this point. I think the patch has a revision number on it. I would try to checking out from SVN that revision, then patch, then update. Hope that helps, Matt
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Andrey Strizhkin added a comment -

          applied AlternateDistributedMLT.patch to trunk (rev 1003607) and got NPE

          23:53:15,452 ERROR [SolrDispatchFilter] java.lang.NullPointerException
                  at org.apache.solr.handler.component.MoreLikeThisComponent.finishStage(MoreLikeThisComponent.java:147)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:315)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
                  at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
                  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
                  at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
                  at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
                  at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
                  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
                  at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
                  at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
                  at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
                  at org.mortbay.jetty.Server.handle(Server.java:326)
                  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
                  at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
                  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
                  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
                  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
                  at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
                  at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          

          fixed NPE but MLT still doesn't work correctly: i think it returns 'like' documents from shard where requested in query document is physically hosted

          Show
          Andrey Strizhkin added a comment - applied AlternateDistributedMLT.patch to trunk (rev 1003607) and got NPE 23:53:15,452 ERROR [SolrDispatchFilter] java.lang.NullPointerException at org.apache.solr.handler.component.MoreLikeThisComponent.finishStage(MoreLikeThisComponent.java:147) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:315) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) fixed NPE but MLT still doesn't work correctly: i think it returns 'like' documents from shard where requested in query document is physically hosted
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Mark Miller added a comment -

          Hmm - bummer - this issue has lots of action, lots of watchers, lots of votes, but has kind of fallen trough the cracks. Hopefully I can find some time to look at bringing the last patch up to trunk sometime soon.

          Show
          Mark Miller added a comment - Hmm - bummer - this issue has lots of action, lots of watchers, lots of votes, but has kind of fallen trough the cracks. Hopefully I can find some time to look at bringing the last patch up to trunk sometime soon.
          Hide
          Jamie Johnson added a comment -

          I modified the patch to apply to trunk, I'm not sure if it's working as expected (mainly because I'm not super familiar with MLT) but it builds. Might save someone some time.

          Show
          Jamie Johnson added a comment - I modified the patch to apply to trunk, I'm not sure if it's working as expected (mainly because I'm not super familiar with MLT) but it builds. Might save someone some time.
          Hide
          Vadim Kisselmann added a comment -

          This patch works, but not perfect
          MLT distributed search works, but if i use more when one field with "mlt.fl", i get an "HTTP Error 400".
          With only one mlt-field, no problem.

          Show
          Vadim Kisselmann added a comment - This patch works, but not perfect MLT distributed search works, but if i use more when one field with "mlt.fl", i get an "HTTP Error 400". With only one mlt-field, no problem.
          Hide
          Jamie Johnson added a comment -

          I haven't had a chance to really play with this lately, can you give an example of the query you are running?

          Show
          Jamie Johnson added a comment - I haven't had a chance to really play with this lately, can you give an example of the query you are running?
          Hide
          Jamie Johnson added a comment -

          After spending a bit looking at this, it appears that something has changed since last this patch was written which is preventing this from working properly. I didn't write the original patch so I'm having difficulty figuring out specifically what is wrong. Currently I am getting the following when running this

           
          <lst name="moreLikeThis">
          <null name="8001ed40-5b54-4ca6-9a17-ffb16179a1de"/>
          <null name="652bfc99-96dd-49a3-8232-057995788b93"/>
          <null name="f422dfbd-d534-490b-b86c-6d0e6586dc7c"/>
          <null name="a0eb8e1b-299e-41cc-a52c-36c2d75f7171"/>
          <null name="05e01ad4-9d7a-4399-931b-257494ed9385"/>
          <null name="894fa0ac-4ac7-4121-a9c5-45c24ba5e6dd"/>
          <null name="b70b4ac4-ac09-42d7-8728-2aa6e236b757"/>
          <null name="be92fa6b-fbf1-4688-8f2f-edbd659ec50e"/>
          <null name="4fa6fb91-8433-4bde-866c-0102b3070f88"/>
          <null name="04109cda-f7e1-4280-903c-e1564585b3e8"/>
          </lst>
          

          If I run with distrib=false this works so definitely is something with the patch.

          Show
          Jamie Johnson added a comment - After spending a bit looking at this, it appears that something has changed since last this patch was written which is preventing this from working properly. I didn't write the original patch so I'm having difficulty figuring out specifically what is wrong. Currently I am getting the following when running this <lst name= "moreLikeThis" > <null name= "8001ed40-5b54-4ca6-9a17-ffb16179a1de" /> <null name= "652bfc99-96dd-49a3-8232-057995788b93" /> <null name= "f422dfbd-d534-490b-b86c-6d0e6586dc7c" /> <null name= "a0eb8e1b-299e-41cc-a52c-36c2d75f7171" /> <null name= "05e01ad4-9d7a-4399-931b-257494ed9385" /> <null name= "894fa0ac-4ac7-4121-a9c5-45c24ba5e6dd" /> <null name= "b70b4ac4-ac09-42d7-8728-2aa6e236b757" /> <null name= "be92fa6b-fbf1-4688-8f2f-edbd659ec50e" /> <null name= "4fa6fb91-8433-4bde-866c-0102b3070f88" /> <null name= "04109cda-f7e1-4280-903c-e1564585b3e8" /> </lst> If I run with distrib=false this works so definitely is something with the patch.
          Hide
          Jamie Johnson added a comment -

          I tracked down what was causing the issue on my part, the original patch assumed the unique key field was "id" and in my index it's "key". I've updated the patch to look that up now. I also supplied multiple fields and that worked properly (as far as I can tell).

          Show
          Jamie Johnson added a comment - I tracked down what was causing the issue on my part, the original patch assumed the unique key field was "id" and in my index it's "key". I've updated the patch to look that up now. I also supplied multiple fields and that worked properly (as far as I can tell).
          Hide
          Vadim Kisselmann added a comment -
          Show
          Vadim Kisselmann added a comment - Hi Jamie, Unfortunately, I can't reproduce this bug now, but i try it this week. I use edismax as default query handler. My queries looks like (default select with mlt-params): http://localhost:8080/solr/shard_1/select?shards=localhost:8080/solr/shard_1,localhost:8080/solr/shard_2&indent=true&q=solr&mlt=true&mlt.fl=text,title&mlt.qf=text&mlt.mintf=1&mlt.minwl=3&mlt.boost=true&rows=10&mlt.mindf=1&start=0 OR with mlt-handler: http://localhost:8080/solr/shard_1/mlt?shards=localhost:8080/solr/shard_1,localhost:8080/solr/shard_2&indent=true&q=solr&mlt.fl=text,title&mlt.qf=text&mlt.mintf=1&mlt.minwl=3&mlt.boost=true&rows=10&mlt.mindf=1&start=0&mlt.interestingTerms=details With more than one field in mlt.fl I get "HTTP 400" exceptions. Thanks for the new patch, I will test it this week
          Hide
          Mark Miller added a comment -

          I missed this recent activity - thanks for grabbing this torch Jamie! Perhaps we can get this in soon.

          Show
          Mark Miller added a comment - I missed this recent activity - thanks for grabbing this torch Jamie! Perhaps we can get this in soon.
          Hide
          Mark Miller added a comment -

          So looks like we need some distrib tests for this - to start though, it looks like the single instance test fails with the latest patch. Have not had a chance to investigate why yet though.

          Show
          Mark Miller added a comment - So looks like we need some distrib tests for this - to start though, it looks like the single instance test fails with the latest patch. Have not had a chance to investigate why yet though.
          Hide
          Mark Miller added a comment -

          Alright - just looks like debug mode is the issue - not working right with MLT and the latest patch.

          Show
          Mark Miller added a comment - Alright - just looks like debug mode is the issue - not working right with MLT and the latest patch.
          Hide
          Jamie Johnson added a comment -

          I unfortunately can't test this now, I can try to take a look in the next week or so if you don't get to it before me

          Show
          Jamie Johnson added a comment - I unfortunately can't test this now, I can try to take a look in the next week or so if you don't get to it before me
          Hide
          Neil Hooey added a comment -

          Has anyone been able to test this yet?

          Show
          Neil Hooey added a comment - Has anyone been able to test this yet?
          Hide
          Jamie Johnson added a comment -

          I unfortunately have not and don't think I'll have the time to do so in the near future.

          The patch was updated to trunk not too long ago so may not be too difficult to revive assuming the original patch worked as expected

          Show
          Jamie Johnson added a comment - I unfortunately have not and don't think I'll have the time to do so in the near future. The patch was updated to trunk not too long ago so may not be too difficult to revive assuming the original patch worked as expected
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Robert Muir added a comment -

          moving all 4.0 issues not touched in a month to 4.1

          Show
          Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
          Hide
          Mark Miller added a comment -

          This patch fixes some formatting in the latest patch and adds the base for some tests as well as one test - it's currently failing.

          Show
          Mark Miller added a comment - This patch fixes some formatting in the latest patch and adds the base for some tests as well as one test - it's currently failing.
          Hide
          Mark Miller added a comment -

          New Patch.

          Adds more tests.

          Fixes a couple bugs that prevented correct results.

          Fixes the debug path for the single node mlt.

          Results are not currently sorted the same way as they are on a single node.

          I don't really have a need or use for this, so if anyone that does could help with testing, that would be great.

          Show
          Mark Miller added a comment - New Patch. Adds more tests. Fixes a couple bugs that prevented correct results. Fixes the debug path for the single node mlt. Results are not currently sorted the same way as they are on a single node. I don't really have a need or use for this, so if anyone that does could help with testing, that would be great.
          Hide
          Mark Miller added a comment -

          Okay - well, since I have some tests that pass and this doesn't mess with single node mlt, I'm just going to commit it. It's better than nothing and we can iterate on it as people try it out.

          Show
          Mark Miller added a comment - Okay - well, since I have some tests that pass and this doesn't mess with single node mlt, I'm just going to commit it. It's better than nothing and we can iterate on it as people try it out.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1421326

          SOLR-788: Distributed search support for MLT.

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1421326 SOLR-788 : Distributed search support for MLT.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1421333

          SOLR-788: Distributed search support for MLT.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1421333 SOLR-788 : Distributed search support for MLT.
          Hide
          Mark Miller added a comment -

          further bug fixe and improvements can go in other issues

          Show
          Mark Miller added a comment - further bug fixe and improvements can go in other issues
          Hide
          Jack Krupansky added a comment -

          I’m curious whether the change in the default value for the mlt.count parameter from 5 in 4.0 to 20 in 4.x is an intentional change or simply a bug that needs to be fixed. I mean, there is no mention in CHANGES.txt or Jira to note the impact on what a user will see.

          Show
          Jack Krupansky added a comment - I’m curious whether the change in the default value for the mlt.count parameter from 5 in 4.0 to 20 in 4.x is an intentional change or simply a bug that needs to be fixed. I mean, there is no mention in CHANGES.txt or Jira to note the impact on what a user will see.
          Hide
          Mark Miller added a comment -

          Unintentional change.

          Show
          Mark Miller added a comment - Unintentional change.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1427218

          SOLR-788: set mlt.count back to 5

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1427218 SOLR-788 : set mlt.count back to 5
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1427219

          SOLR-788: set mlt.count back to 5

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1427219 SOLR-788 : set mlt.count back to 5
          Hide
          Jack Krupansky added a comment -

          Thanks, Mark!

          Show
          Jack Krupansky added a comment - Thanks, Mark!
          Hide
          Suneel Marthi added a comment -

          Is this fix a part of the official Solr 4.1 release now?

          Show
          Suneel Marthi added a comment - Is this fix a part of the official Solr 4.1 release now?
          Hide
          Mark Miller added a comment -

          Yes, though it may be rough around the edges - give it a try.

          Show
          Mark Miller added a comment - Yes, though it may be rough around the edges - give it a try.
          Hide
          Suneel Marthi added a comment - - edited

          I have Solr 4.1 setup, and trying to execute mlt searches - it doesn't seem to be working (unless I am doing something fundamentally wrong).

          Here's what I did:-

          1. Setup Solr 4.1, modified the <luceneMatchVersion> field in solr config.xml to be Lucene_41. Also have the below entry enabled in the config file
          <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
          </requestHandler>

          2. We are running distributed Solr servers (about 30 of them, each pointing to their respective shards and the shards are not replicated). There is a Master Solr (in addition to the 30 slave Solrs) and all queries are directed to the Master.

          3. Ran the following mlt query
          http://localhost:8900/solr/collection1/select?q=microsoft&mlt=true&mlt.fl=content&mlt.mindf=1&mlt.mintf=1&fl=id,content

          <id,content> are fields defined in our Solr schema.

          4. Solr seems to execute the query and see the below error after a few minutes of trying to execute the above request

          <response>
          <lst name="responseHeader">
          <int name="status">400</int>
          <int name="QTime">318230</int>
          <lst name="params">
          <str name="mlt.mindf">1</str>
          <str name="fl">id,content</str>
          <str name="mlt.fl">content</str>
          <str name="q">microsoft</str>
          <str name="mlt.mintf">1</str>
          <str name="mlt">true</str>
          </lst>
          </lst>
          <lst name="error">
          <str name="msg">org.apache.solr.search.SyntaxError: Cannot parse '+(content:urn:schemas content:xml:namespace content:prefix content:ns content:microsoft content:com:vml content:com:office:office content:v content:o) -id:nordstoga.com': Encountered " ":" ": "" at line 1, column 13.
          Was expecting one of:
          <AND> ...
          <OR> ...
          <NOT> ...
          "+" ...
          "-" ...
          <BAREOPER> ...
          "(" ...
          ")" ...
          "*" ...
          "^" ...
          <QUOTED> ...
          <TERM> ...
          <FUZZY_SLOP> ...
          <PREFIXTERM> ...
          <WILDTERM> ...
          <REGEXPTERM> ...
          "[" ...
          "{" ...
          <LPARAMS> ...
          <NUMBER> ...
          </str>
          <int name="code">400</int>
          </lst>
          </response>

          Am I doing this right?

          Show
          Suneel Marthi added a comment - - edited I have Solr 4.1 setup, and trying to execute mlt searches - it doesn't seem to be working (unless I am doing something fundamentally wrong). Here's what I did:- 1. Setup Solr 4.1, modified the <luceneMatchVersion> field in solr config.xml to be Lucene_41. Also have the below entry enabled in the config file <requestHandler name="/mlt" class="solr.MoreLikeThisHandler"> </requestHandler> 2. We are running distributed Solr servers (about 30 of them, each pointing to their respective shards and the shards are not replicated). There is a Master Solr (in addition to the 30 slave Solrs) and all queries are directed to the Master. 3. Ran the following mlt query http://localhost:8900/solr/collection1/select?q=microsoft&mlt=true&mlt.fl=content&mlt.mindf=1&mlt.mintf=1&fl=id,content <id,content> are fields defined in our Solr schema. 4. Solr seems to execute the query and see the below error after a few minutes of trying to execute the above request <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">318230</int> <lst name="params"> <str name="mlt.mindf">1</str> <str name="fl">id,content</str> <str name="mlt.fl">content</str> <str name="q">microsoft</str> <str name="mlt.mintf">1</str> <str name="mlt">true</str> </lst> </lst> <lst name="error"> <str name="msg">org.apache.solr.search.SyntaxError: Cannot parse '+(content:urn:schemas content:xml:namespace content:prefix content:ns content:microsoft content:com:vml content:com:office:office content:v content:o) -id:nordstoga.com': Encountered " ":" ": "" at line 1, column 13. Was expecting one of: <AND> ... <OR> ... <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... ")" ... "*" ... "^" ... <QUOTED> ... <TERM> ... <FUZZY_SLOP> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... <NUMBER> ... </str> <int name="code">400</int> </lst> </response> Am I doing this right?
          Hide
          Mark Miller added a comment -

          Why did you point out the mlt handler was enabled, but then you use the select handler? Does your select handler have the mlt component in it?

          You might want to take this to the user list.

          Show
          Mark Miller added a comment - Why did you point out the mlt handler was enabled, but then you use the select handler? Does your select handler have the mlt component in it? You might want to take this to the user list.
          Hide
          Colin Bartolome added a comment -

          It seems that you may or may not get interesting terms, depending on which shard serves the request. (I was getting very confused, because my manually constructed URL was working, while my SolrJ request was not, until I noticed that the former was being served by shard1 and the latter by shard2.) I'm guessing you'll get no results if the shard that serves your request doesn't contain the document you're trying to query.

          I'll try to tighten up a test case and get it filed, but I thought I'd mention it here, in case anybody had suspicions.

          Show
          Colin Bartolome added a comment - It seems that you may or may not get interesting terms, depending on which shard serves the request. (I was getting very confused, because my manually constructed URL was working, while my SolrJ request was not, until I noticed that the former was being served by shard1 and the latter by shard2.) I'm guessing you'll get no results if the shard that serves your request doesn't contain the document you're trying to query. I'll try to tighten up a test case and get it filed, but I thought I'd mention it here, in case anybody had suspicions.
          Hide
          Bill Mitchell added a comment -

          Suneel Marthi's issue above, where the derivative query passed to the shard is invalid, is similar to the issue I documented for numeric keys in SOLR-5521. Here, the query terms extracted from the bean for which we are searching for similar beans includes terms with embedded colons. When the MoreLikeThis component under the search handler builds a MoreLikeTheseQuery, the extracted query terms need to be quoted.

          Show
          Bill Mitchell added a comment - Suneel Marthi's issue above, where the derivative query passed to the shard is invalid, is similar to the issue I documented for numeric keys in SOLR-5521 . Here, the query terms extracted from the bean for which we are searching for similar beans includes terms with embedded colons. When the MoreLikeThis component under the search handler builds a MoreLikeTheseQuery, the extracted query terms need to be quoted.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Grant Ingersoll
            • Votes:
              16 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development