Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-295

Implementing MoreLikeThis support in DismaxRequestHandler



    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: search
    • Labels:


      There's nothing too clever about this initial patch to be upload shortly, I have simply extracted the MLT code from the StandardRequestHandler and inserted it into the DismaxRequestHandler. However, there are some broader MLT issues that I'd also like to address in the near future:

      1) (trivial) No "This response format is experimental" warning when MLT is used with StandardRequestHandler (or DismaxRequestHandler). Not really a big deal but at least makes developers aware of the possibility of future changes.

      2) (trivial) "org.apache.solr.common.util.MoreLikeThisParams" should perhaps be moved to the more appropriate package "org.apache.solr.common.params".

      3) (non-trivial) The ability to specify the list of fields that should be returned when MLT is invoked from an external handler (i.e. StandardRequestHandler). Currently the field list (FL) parameter is inherited from the main query but I can envisage cases where it would be desirable to specify more or less return fields in the MLT query than the main query. One complication is that "mlt.fl" is already used to specify the fields used for similarity. Perhaps "mlt.fl" is not the best name for this parameter and should be renamed to avoid potential conflict / confusion?

      4) (fairly-trivial) On a similar note to 3, there is currently no way to specify a "start" value for the rows returned when MLT is invoked from an external handler (e.g. StandardRequestHandler), it is hard-coded to 0 (i.e. the first "mlt.count" documents matched). While I can see the logic in naming the parameter "mlt.count", it does seem a little inconsistent and perhaps it would be better to rename (or at least alias) it to "mlt.rows" to be consistent with the CommonQueryParameters. Note that "mlt.start" is fundamentally different to the "mlt.match.offset" parameter as the later deals with documents matching the initial MLT query while the former deals with documents returned by the MLT query (hope that makes sense).

      I have created a patch that implemented "mlt.start" (to specify the start doc) and added "mlt.rows" that could be used interchangeably with "mlt.count" (but I would prefer to remove "mlt.count" altogether), but since it involves changing the method definition of MoreLikeThisHelper.getMoreLikeThese(), I wanted to get some opinions before submitting it.

      5) (non-trivial) Interesting Terms - the ability to return interesting term information using the "mlt.interestingTerms" parameter when MLT is invoked from an external handler. This is perhaps the most useful feature I am looking to implement, I can see great benefit in being able to provide a list of interesting terms or "keywords" for each document returned in a standard or dismax query. Currently this only available from the MLT request handler so perhaps the best approach would be to re-factor the "interestingTerms" code in MoreLikeThisHandler class and put it somewhere in MoreLikeThisHelper so it is available to all handlers? Again, I would appreciate any comments or suggestions.

      I've also noted the MLT features suggested by Tristan [ http://www.nabble.com/MoreLikeThis-with-DisMax-boost-query---functions-tf4047187.html ] which could quite possibly be rolled together with the above points – I'm not sure whether is is better to have a single ticket tracking several related issues or create invididual tickets for each issue, however will be happy to comply with the Solr issue tracking policy on advice from the core developers.



          Issue Links



              • Assignee:
                pberkel Pieter Berkel
              • Votes:
                0 Vote for this issue
                0 Start watching this issue


                • Created: