|
Hi Phil,
Thanks for posting this code. I look forward to having a Java search client be a part of Solr. I took an initial look through the code and have a few comments. 1. On the interface side, I like the general idea, but I think it could be extended a bit. Specifically, one of Solr's strengths is that it knows the type of the fields in each document. I'd like to see a hierarchy of Field classes that capture this. Right now, a field value is always a String, even though the Solr server comes back with things like <str name="id">45</str> that clearly indicate the type of each field that comes back. The SolrSearcher code ignores the type when it pulls out the value of each triple. This leaves the application that's using the SolrSearcher to have to maintain some kind of knowledge about the server-side schema, and keep that knowledge in sync with any changes on the server side. 2. I'd like to see the dependencies on JDOM and the Commons HttpClient removed if possible. The fewer external dependencies there are, the more broadly this code can be adopted. 3. I don't quite understand the use of the Integer type in many of the fields in Response. Is there a reason they can't just be ints? I see that in SolrSearcher you are using the constructor Integer(String) to parse String attribute values. But that doesn't mean you need to store the result as an Integer. Indeed you could use Integer.parseInt(String) instead and never have to construct a new object at all. Thanks for the comments. Here are my thoughts:
1) Good point. One approach for doing this would be to what Commons Configuration does. I could add methods to Field to perform getValueAsInteger(), getValueAsBoolean(), etc. These are basically just convenient methods. The other approach would be to change Field.value to Object instead of String. And then it's up to the client code to figure out what Object is (presumably using instanceof). So while I agree with your idea, I'm not sure what people think the best way to do this is. 2) JDOM - agreed. I just did it this way because writing DOM code takes me five times longer 3) Actually, that was a question I had. Are those fields always guaranteed to be there in Solr's response? If not, then they ought to able to contain null so that means they could be Integers. If Solr guarantees that these fields will always be in the response, then they definitely could be ints. Other thoughts? I'm sure you know a lot more than Commons HttpClient than I do, so I'll defer to your judgement on whether the additional functionality is worth the additional dependency. Some questions in the hope that I can learn more about this and maybe switch to Commons for other stuff I'm doing:
1. On the persistence point, does Commons HttpClient do something beyond reusing the same TCP connection, which is the default behavior of the JDK? See e.g. http://java.sun.com/j2se/1.5.0/docs/guide/net/http-keepalive.html 2. On the threading point, if two threads get seperate HttpURLConnection objects by independently calling java.net.URL.getConnection() on the same URL, can they then stomp on each other by using those simultaneously? The HttpURLConnection objects are different, so I would hope not. I know that the reuse of TCP connections happens behind the scenes in some static cache hidden away inside java.net.HttpURLConnection, but I wasn't aware of it not being thread safe. Is this the issue Commons HttpClient is trying to address, or is it some other issue and I'm just missing the point? Hi Phil, thanks for the code!
Solr's response is very generic... Solr supports custom query handlers than can return arbitrary data like category counts for faceted browsing, context snippets with highlight info, multiple query result sets, etc. Your Response class pretty much maps to a DocList (a list of documents that match a query) or the <response> element in the XML. It's possible to have multiple of these. Whatever mapping you come up with, there will be some peope that want something different. There should probably be some low level methods that allow one to get the InputStream or Reader of the response. This could be important, for example, if someone is asking for all the docs in the index and needs to stream the results. What relationship should the query client have with the update client? Probably makes sense for them to at least use the same HTTP client, even if they don't share any implementation. Should they be in the same solrclient.jar, or different solrupdater.jar, solrquery.jar? Hi Phil. I'm using your search client and it is working pretty well. We did notice on thing that appears incorrect. The sort mechanism being performed by the client adds a request parameter before sending to lucene.
/solr/select?q=term&sort=name+asc According to lucene docs (http://incubator.apache.org/solr/tutorial.html#Sorting /solr/select?q=term;name+asc I believe we could have some interfaces and their implementations (HttpClient, java.net.*, JDOM), the way it was done in Nutch.
Just as a good sample of wide acceptance: Commons HttpClient could be easily configured via Spring Framework's Dependency Injection... The main method in SolrSearcher could be made abstract (in Interface), and another class could be HttpClient-specific Hi Philip,
Many thanks for posting the sample, just a few (new) thoughts after getting more familiar with SOLR and Cocoon... Background: "HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby)" Am I right? So, in this case preferable way should be Java-over-HTTP transport layer, instead of HttpClient+XmlParser... your sample is simply Java-over-XML-over-HTTP (why not over JSON, or even CSV?) Probably RMI-IIOP is the answer (which is Java-RMI-over-HTTP), but I'd prefer XSL/XML anyway... JSON is better than XML in case of AJAX, XML is preferable for 'server-side' transformations... Hi all,
I had a look at the code and I do not understand a couple of things. Since the client can request any response format by defining it in the query string I am not sure whether the IMO the java client should make it easy to search a solr server with an e.g. custom servlet. This way we could leverage all helper classes to connect to the server into the client. What format will be returned depends on the type defined in the query string that is the reason why I do not thing the JDOM stuff makes sense. Further the different "public Response search" methods lead is IMO not generic enough, why not simply use Hi,
I just downloaded this code and compiled it, I assume I need to use the SolrSearcher methods. I tired the following example and getting error SolrSearcher solrSearcher = new SolrSearcher("http://localhost:8090/solr/select/"); Response res = solrSearcher.search("water", filedsList, "String"); When I run the above code, I am getting the following error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException Any help is appreciated. Thanks! I think this issue is now out of date - it looks like moved everything to
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Length Date Time Name
-------- ---- ---- ----
804 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/Field.java
1337 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/Response.java
390 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/SearchException.java
5873 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/SolrSearcher.java
-------- -------
8404 4 files