|
I really like this modular approach to handling search requests, it will greatly simplify the process of adding new functionality (e.g. collapsing, faceting, more-like-this) to existing handlers without the need for unnecessary code replication. My primary goal is to extend the more-like-this handler capabilities and make them available to other handlers (such as dismax), and I think the proposed solution is a good approach.
Some issues that I can forsee though are: 1) Ordering: its fairly obvious that certain handlers need to be called before others (e.g. standard / dismax query parsing before faceting / highlighting) however there may be cases where the required sequence of events may be more subtle (e.g. faceting the results of a more-like-this query). There probably needs to be some mechanism to determine the order in which the components are prepared / processed. 2) Dependancy: a situation may arise where a component depends on operations performed by another component (e.g. more-like-this may take advantage of the dismax 'bq' parameter), perhaps there needs to be some method of specifying component dependency so that the SearchHandler can load and process required components automatically? I hope this make sense, I'm fairly new to Solr development so I'm afraid my contributions to this issue would be mostly limited to (hopefully helpful) ideas and suggestions however I'm happy to tinker with the patched code from above and help test this new component framework as it is developed. cheers, > Is anybody working on this issue? not actively. > if not, I can start looking at it. Of course! updatedprevious patch to work against trunk (r562043)
Following changes from the previous patch:-
-Supported components in DisMaxRequestHandler. Extended StandardRequestHandler and DisMaxRequestHandler from SearchHandler. -Deprecated StandardRequestHandler and DisMaxRequestHandler -Components configured via solrconfig.xml -Added DisMaxResponseBuilder, DisMaxQueryComponent and DisMaxDebugComponent classes Built and tested successfully with junit. I just tried this patch on svn trunk (r566899) and got the following failures:
$ patch -p0 < ../ I suspect it is the changes made by thanks, Fixed the patch to work with the trunk.
Thanks Sharad! this is looking good.
I just updated it to work with trunk and cleaned up a few things.
Yonik and/or Hoss... if you have time to check this out, that would be great. I'm having trouble applying the latest patch to trunk (r575809) again:
$ patch -p0 < ../ It also looks like the additions to solrconfig.xml have not been included in the latest patch either. I was also going to suggest that it might be a good idea to support class shorthand notation, so org.apache.solr.handler.component.* can be written solr.component.* in solrconfig.xml. updated to work with trunk.
Added the lost solrconfig.xml example: <requestHandler name="/search" class="solr.SearchHandler">
Re class aliases, it should work fine to put: updated to work with trunk (587079);
updated HighlightComponent to use SolrHighlighter. junit 0 failures/0 errors on Solaris 10 & WinXP. I've tried and failed to overcome the diff/patch rejects that Pieter experienced. StandardRequestHandler.java generates rejects; replace the whole class definition with the '-' prefixed lines of the reject. WARNING since this is a 'unified' patch and not wanting to mess with Sharad & Ryan's patch versions, I attached the patch as solr-281.patch (not as FYI, I'm in the process of updating this patch to work with trunk again (pluggable query parser stuff...
Just a refresh on the latest trunk (587494) to keep things working before we got Yonik's proper version.
junit 0 errors 0 failures. Same patch production process. Here's a new (smaller) patch that utilizes pluggable query parsers, and
Thoughts on these changes? We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense). Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original. How will a users custom component get configuration? Should components be a full plugins with an init() for configuration? Yonik, i could not get your last patch to apply cleanly to trunk. This is my best attempt to resolve the conflicts and fix a few things.
1. remove the call to getResponseBuilder() in SearchComponent
The dismax changes look great - i like the new query parser stuff! > - removed variable RequestBuilder class logic since it seems unnecessary... If at all possible, I think we should do our best to avoid depending on Map<String> for an interface. > yes > yes, that makes sense. How would a component replace the ResponseBuilder? If it could do that, there is obviously no need for the variable RequestBuilder class logic > As is, they are defined in a "components" section for SearchHandler (from the example solrconfig.xml): <requestHandler name="/search" class="solr.SearchHandler"> Perhaps the components should be passed the init params? Unless there is a compelling reason, I don't think components need to be shared across request handlers thus justifying a top level component registry.
That's OK, the one you produced fails for me on a clean checkout too.... looks like maybe we hit a little corner case with patch/diff. I'll see if I can jiggle things a bit to get better patch behavior first...
OK, uploading the one that works for me now.
Hold on... this patch is double in size from the last for some reason.... deleting.
OK, this patch should do it... sorry for the noise.
ok, that one works (or close enough)
> If you feel comfortable with the big picture, yes, I think we should commit this and refine the ResponseBuilder and perhaps the configuration options in smaller patches. Here is an updated version of your patch that converts the ResponseBuilder public variables to private vars with get/set methods.
How do you feel about committing this and working out the ResponseBuilder details in subsequent smaller patches? Not sure process-wise whether votes/watchers got a say, my apologies if not; +1 otherwise.
> How do you feel about committing this and working out the ResponseBuilder details in subsequent smaller patches?
+1 In this case I think the current patch represents the right direction, seems to work fine, and committing now will save effort of people working on other patches. The core changes are committed in rev594268.
I will resolve this issue and open a new issue to track cleaning up the ResponseBuilder Lets track ResponseBuilder issues with
The SearchComponent framework needs to allow for custom component initialization – I previously believed this could be done from within the RequestHandler NamedList args. In trying to use this for a custom component that needs to load a resource file, I found it is woefully inadequate.
As Yonik suggested a while back (and I disagreed with), I now think components should be loaded and initialized the same way as RequestHandlers. That is, a top level name based registry in SolrCore that SearchHandlers can share. For example, solrconfig.xml could declare: <searchComponent name="query" class="org.apache.solr.handler.component.QueryComponent" /> <searchComponent name="facet" class="org.apache.solr.handler.component.FacetComponent" /> <searchComponent name="mlt" class="org.apache.solr.handler.component.MoreLikeThisComponent" /> <searchComponent name="highlight" class="org.apache.solr.handler.component.HighlightComponent" /> <searchComponent name="debug" class="org.apache.solr.handler.component.DebugComponent" /> and a SearchHandler can pick and choose what components are used with: <requestHandler name="/search" class="solr.SearchHandler"> <arr name="components"> <str>query</str> <str>facet</str> <str>mlt</str> <str>highlight</str> <str>debug</str> </arr> </requestHandler> Also, there should be a way to configure components to run before or after the standard component list without having to know and maintain what the "standard" list is. For example, when we add field collapsing to the 'standard' options, it should not require everyone to update their solrconfig.xml When This patch uses
This makes SearchComponents a top level plugin (just like RequestHandlers) that get registered to a unique name in SolrCore. A SearchHandler can configure the component chain using: <requestHandler name="/search" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> </lst> <!-- By default, this will register the following components: <arr name="components"> <str>query</str> <str>facet</str> <str>mlt</str> <str>highlight</str> <str>debug</str> </arr> To insert handlers before or after the 'standard' components, use: <arr name="first-components"> <str>first</str> </arr> <arr name="last-components"> <str>last</str> </arr> --> </requestHandler> The other idea I had was numeric positioning... query=1000, facet=2000, etc, and so a user could add their component at any point. Perhaps your first/last is sufficient though.
The problem first/last is trying to solve is to let a custom handler automatically incorporate default changes to the standard components without editing solrconfig.xml.
Numeric positioning seems a bit brittle unless the 'standard' components have a locked position. Without opening the door to legacy problems, I don't see any good way to insert a component between "facet" and "mlt" without specifying the whole chain. updated initialization patch from
I am updating the distributed search patch (
I added the dist search components as, <searchComponent name="gstat" class="org.apache.solr.handler.federated.component.GlobalCollectionStatComponent" /> <requestHandler name="/search" class="solr.SearchHandler"> <arr name="last-components"> </requestHandler> But it was not working. On debugging i found that these added components were not getting registered. I made the following change in SolrCore.loadSearchComponents, // NamedListPluginLoader<SearchComponent> loader = new NamedListPluginLoader<SearchComponent>( xpath, searchComponents ); Sabyasachi - check that your patch does not comment out the loading line...
the committed version of this patch is not commented out: This is great; decomposing the handler and allowing the components to be wired up in the config really helps development (and maintenance of those changes).
For my purposes, I needed to make a change to the way the dismax query was being generated. Using the DisMaxQParserPlugin as a template, I created my own QParser and associated QParserPlugin; changed the relevant bits; added a <queryParser...> entry in solrconfig.xml; added the 'defType' parameter to the wanted SearchHandler configuration...and...all works well. Just a few comments:
That would require instantiation with reflection I think.
QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner.
Reflection is already being used to create the QParserPlugins (SolrCore:1027 and AbstractPluginLoader:83) - I'm guessing the reason for the plugin is just to avoid creating instances through reflection on every parse (as you could keep hold of the QParser class and call newInstance). The second point is moot, once you take away the need for createParser(...). It's really not that big-a-deal, in the scheme of things.
As an aside, method signature changes are usually trivial to fix; personally, the pain of those fixes is favourable to extending an abstract class unnecessarily. Followed up on solr-dev to avoid stealing more of this JIRA isse:
http://www.nabble.com/Re%3A--jira--Commented%3A-Search-Components-%28plugins%29-to15227648.html#a15227648 I'll mark this issue as "fixed" – I'm not sure why it was still open, the reason it was repoened has been resolved. Any new issues should get their own JIRA issue
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It is mostly intended to flush out design issues.
This works by sticking a 'ResponseBuilder' in the context and sharing that between each component.