Issue Details (XML | Word | Printable)

Key: SOLR-295
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Pieter Berkel
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Solr

Implementing MoreLikeThis support in DismaxRequestHandler

Created: 10/Jul/07 02:11 AM   Updated: 15/Apr/08 11:28 PM
Return to search
Component/s: search
Affects Version/s: 1.3
Fix Version/s: 1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works MoreLikeThis-DismaxRequestHandler_SOLR-295.patch 2007-07-10 02:13 AM Pieter Berkel 1 kB
Issue Links:
Dependants
 

Resolution Date: 18/Nov/07 09:36 PM


 Description  « Hide
There's nothing too clever about this initial patch to be upload shortly, I have simply extracted the MLT code from the StandardRequestHandler and inserted it into the DismaxRequestHandler. However, there are some broader MLT issues that I'd also like to address in the near future:

1) (trivial) No "This response format is experimental" warning when MLT is used with StandardRequestHandler (or DismaxRequestHandler). Not really a big deal but at least makes developers aware of the possibility of future changes.

2) (trivial) "org.apache.solr.common.util.MoreLikeThisParams" should perhaps be moved to the more appropriate package "org.apache.solr.common.params".

3) (non-trivial) The ability to specify the list of fields that should be returned when MLT is invoked from an external handler (i.e. StandardRequestHandler). Currently the field list (FL) parameter is inherited from the main query but I can envisage cases where it would be desirable to specify more or less return fields in the MLT query than the main query. One complication is that "mlt.fl" is already used to specify the fields used for similarity. Perhaps "mlt.fl" is not the best name for this parameter and should be renamed to avoid potential conflict / confusion?

4) (fairly-trivial) On a similar note to 3, there is currently no way to specify a "start" value for the rows returned when MLT is invoked from an external handler (e.g. StandardRequestHandler), it is hard-coded to 0 (i.e. the first "mlt.count" documents matched). While I can see the logic in naming the parameter "mlt.count", it does seem a little inconsistent and perhaps it would be better to rename (or at least alias) it to "mlt.rows" to be consistent with the CommonQueryParameters. Note that "mlt.start" is fundamentally different to the "mlt.match.offset" parameter as the later deals with documents matching the initial MLT query while the former deals with documents returned by the MLT query (hope that makes sense).

I have created a patch that implemented "mlt.start" (to specify the start doc) and added "mlt.rows" that could be used interchangeably with "mlt.count" (but I would prefer to remove "mlt.count" altogether), but since it involves changing the method definition of MoreLikeThisHelper.getMoreLikeThese(), I wanted to get some opinions before submitting it.

5) (non-trivial) Interesting Terms - the ability to return interesting term information using the "mlt.interestingTerms" parameter when MLT is invoked from an external handler. This is perhaps the most useful feature I am looking to implement, I can see great benefit in being able to provide a list of interesting terms or "keywords" for each document returned in a standard or dismax query. Currently this only available from the MLT request handler so perhaps the best approach would be to re-factor the "interestingTerms" code in MoreLikeThisHandler class and put it somewhere in MoreLikeThisHelper so it is available to all handlers? Again, I would appreciate any comments or suggestions.

I've also noted the MLT features suggested by Tristan [ http://www.nabble.com/MoreLikeThis-with-DisMax-boost-query---functions-tf4047187.html ] which could quite possibly be rolled together with the above points – I'm not sure whether is is better to have a single ticket tracking several related issues or create invididual tickets for each issue, however will be happy to comply with the Solr issue tracking policy on advice from the core developers.

regards,
Pieter



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pieter Berkel added a comment - 10/Jul/07 02:13 AM
Patch to add MoreLikeThis functionality to the DismaxRequestHandler.

Ryan McKinley added a comment - 10/Jul/07 04:48 AM
I haven't looked at the patch yet. Everything sounds reasonable. I am a bit reluctant to glob MLT on to the dismax request handler because we keep seeing the need to glob on more and more. Recent discussions have pointed towards a 'search component' framework. Something that defines a chain of stuff that could typically happen in a query (dismax+mlt+faceting+faceting on mlt+collapse+highlighting+...). SOLR-281 is a quick/crude implementation.

something to think about...


Pieter Berkel added a comment - 10/Jul/07 07:14 AM
Thanks Ryan, I missed that original thread mentioned in SOLR-281 but completely agree with the line of thinking and proposals, (actually I was thinking the same when I made the above patch). There is little point in duplicating code across request handlers (leading to code bloat as you suggested), refactoring common functionality in separate components is going to ensure consistency in the response format across all handlers.

I'll take a look at the patch submitted on SOLR-281 and see what I can do in terms of implementing my MLT ideas, however until the 'search component' framework concept has really been 'solidified', I'm afraid it's going to be difficult to extend.

regards,
Pieter


Pieter Berkel added a comment - 10/Jul/07 12:49 PM
Probably a good idea to focus efforts on developing pluggable Search Components described in SOLR-281 before tackling suggested MLT improvements.

Ryan McKinley added a comment - 18/Nov/07 09:36 PM
In SOLR-281, DisMaxRequestHandler became a subclass of StandardRequestHandler and both include the MoreLIkeThis query component.

Hoss Man added a comment - 15/Apr/08 11:28 PM
This bug was modified as part of a bulk update using the criteria...
  • Currently marked ("Resolved" or "Closed") and "Fixed"
  • Had no "Fix Version" versions
  • "Affects Versions" included 1.3

The Fix Version for all 8 issues found was set to 1.3 (1.3 has not yet been released, if an issue is already fixed, and it affected 1.3 then the fix will be in 1.3)

Email notification was suppressed to prevent excessive email.

For a list of all the issues modified, search jira comments for this
(hopefully) unique string: 20080415hossman1