Issue Details (XML | Word | Printable)

Key: SOLR-30
Type: New Feature New Feature
Status: Closed Closed
Resolution: Duplicate
Priority: Minor Minor
Assignee: Unassigned
Reporter: Philip Jacob
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Solr

Java client code for performing searches against a Solr instance

Created: 16/Jul/06 04:41 AM   Updated: 14/Jan/07 11:29 PM
Return to search
Component/s: clients - java
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works solrsearcher-client.zip 2006-07-16 04:41 AM Philip Jacob 4 kB
Issue Links:
Incorporates
 

Resolution Date: 14/Jan/07 11:29 PM


 Description  « Hide
Here are a few classes that connect to a Solr instance to perform searches. Results are returned in a Response object. The Response encapsulates a List<Map<String,Field>> that gives you access to the key data in the results. This is the main part that I'm looking for comments on.

There are 2 dependencies for this code: JDOM and Commons HttpClient. I'll remove the JDOM dependency in favor of regular DOM at some point, but I think that the HttpClient dependency is worthwhile here. There's a lot that can be exploited with HttpClient that isn't demonstrated in this class. The purpose here is mainly to get feedback on the API of SolrSearcher before I start optimizing anything.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Philip Jacob added a comment - 16/Jul/06 04:41 AM
Attachment contains:

Length Date Time Name
-------- ---- ---- ----
804 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/Field.java
1337 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/Response.java
390 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/SearchException.java
5873 07-16-06 00:37 solr-trunk/src/java/org/apache/solr/client/SolrSearcher.java
-------- -------
8404 4 files


Darren Erik Vengroff added a comment - 16/Jul/06 06:11 PM
Hi Phil,

Thanks for posting this code. I look forward to having a Java search client be a part of Solr.

I took an initial look through the code and have a few comments.

1. On the interface side, I like the general idea, but I think it could be extended a bit. Specifically, one of Solr's strengths is that it knows the type of the fields in each document. I'd like to see a hierarchy of Field classes that capture this. Right now, a field value is always a String, even though the Solr server comes back with things like

<str name="id">45</str>
<bool name="inStock">true</bool>
<float name="price">19.95</float>
...

that clearly indicate the type of each field that comes back. The SolrSearcher code ignores the type when it pulls out the value of each triple. This leaves the application that's using the SolrSearcher to have to maintain some kind of knowledge about the server-side schema, and keep that knowledge in sync with any changes on the server side.

2. I'd like to see the dependencies on JDOM and the Commons HttpClient removed if possible. The fewer external dependencies there are, the more broadly this code can be adopted.

3. I don't quite understand the use of the Integer type in many of the fields in Response. Is there a reason they can't just be ints? I see that in SolrSearcher you are using the constructor Integer(String) to parse String attribute values. But that doesn't mean you need to store the result as an Integer. Indeed you could use Integer.parseInt(String) instead and never have to construct a new object at all.


Philip Jacob added a comment - 16/Jul/06 07:43 PM
Thanks for the comments. Here are my thoughts:

1) Good point. One approach for doing this would be to what Commons Configuration does. I could add methods to Field to perform getValueAsInteger(), getValueAsBoolean(), etc. These are basically just convenient methods. The other approach would be to change Field.value to Object instead of String. And then it's up to the client code to figure out what Object is (presumably using instanceof). So while I agree with your idea, I'm not sure what people think the best way to do this is.

2) JDOM - agreed. I just did it this way because writing DOM code takes me five times longer But, yes, it should go. Commons HttpClient, on the other hand, has a lot of useful stuff like multhreaded connection management and connection persistence. Even in medium-volume situations, these optimizations will make a difference. The prospect of implementing a subset of the functionality in Commons HttpClient is not an enviable task.

3) Actually, that was a question I had. Are those fields always guaranteed to be there in Solr's response? If not, then they ought to able to contain null so that means they could be Integers. If Solr guarantees that these fields will always be in the response, then they definitely could be ints.

Other thoughts?


Darren Erik Vengroff added a comment - 17/Jul/06 01:07 AM
I'm sure you know a lot more than Commons HttpClient than I do, so I'll defer to your judgement on whether the additional functionality is worth the additional dependency. Some questions in the hope that I can learn more about this and maybe switch to Commons for other stuff I'm doing:

1. On the persistence point, does Commons HttpClient do something beyond reusing the same TCP connection, which is the default behavior of the JDK? See e.g. http://java.sun.com/j2se/1.5.0/docs/guide/net/http-keepalive.html.

2. On the threading point, if two threads get seperate HttpURLConnection objects by independently calling java.net.URL.getConnection() on the same URL, can they then stomp on each other by using those simultaneously? The HttpURLConnection objects are different, so I would hope not. I know that the reuse of TCP connections happens behind the scenes in some static cache hidden away inside java.net.HttpURLConnection, but I wasn't aware of it not being thread safe. Is this the issue Commons HttpClient is trying to address, or is it some other issue and I'm just missing the point?


Yonik Seeley added a comment - 17/Jul/06 08:24 PM
Hi Phil, thanks for the code!

Solr's response is very generic... Solr supports custom query handlers than can return arbitrary data like category counts for faceted browsing, context snippets with highlight info, multiple query result sets, etc.
Just recently highlighting info was added to the standard and dismax query handlers. It would be nice to have a way to access more generic responses.

Your Response class pretty much maps to a DocList (a list of documents that match a query) or the <response> element in the XML. It's possible to have multiple of these.
The <header> containing status and qtime will only appear once.

Whatever mapping you come up with, there will be some peope that want something different. There should probably be some low level methods that allow one to get the InputStream or Reader of the response. This could be important, for example, if someone is asking for all the docs in the index and needs to stream the results.

What relationship should the query client have with the update client? Probably makes sense for them to at least use the same HTTP client, even if they don't share any implementation. Should they be in the same solrclient.jar, or different solrupdater.jar, solrquery.jar?


John Hodgson added a comment - 12/Sep/06 04:16 PM
Hi Phil. I'm using your search client and it is working pretty well. We did notice on thing that appears incorrect. The sort mechanism being performed by the client adds a request parameter before sending to lucene.

/solr/select?q=term&sort=name+asc
GET /solr/select?q=p*+AND+entryType%3Atag&sort=name+asc&rows=100

According to lucene docs (http://incubator.apache.org/solr/tutorial.html#Sorting), the sort '''should be''' appended after the query term:

/solr/select?q=term;name+asc
GET /solr/select?q=p*+AND+entryType%3Atag%3Bname+asc&rows=100


Fuad Efendi added a comment - 06/Nov/06 10:23 PM
I believe we could have some interfaces and their implementations (HttpClient, java.net.*, JDOM), the way it was done in Nutch.
Just as a good sample of wide acceptance: Commons HttpClient could be easily configured via Spring Framework's Dependency Injection... The main method in SolrSearcher could be made abstract (in Interface), and another class could be HttpClient-specific

Fuad Efendi added a comment - 20/Nov/06 12:44 AM
Hi Philip,

Many thanks for posting the sample, just a few (new) thoughts after getting more familiar with SOLR and Cocoon...

Background: "HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby)"
Requirements: "HTTP interface supporting pure Java clients"

Am I right?

So, in this case preferable way should be Java-over-HTTP transport layer, instead of HttpClient+XmlParser... your sample is simply Java-over-XML-over-HTTP (why not over JSON, or even CSV?)

Probably RMI-IIOP is the answer (which is Java-RMI-over-HTTP), but I'd prefer XSL/XML anyway... JSON is better than XML in case of AJAX, XML is preferable for 'server-side' transformations...


Thorsten Scherler added a comment - 26/Dec/06 12:19 PM
Hi all,

I had a look at the code and I do not understand a couple of things.

Since the client can request any response format by defining it in the query string I am not sure whether the
protected Response createResponse(final String _xml, final List<String> _fields) throws SAXException, IOException, ParserConfigurationException, JDOMException {
makes so much sense at all.

IMO the java client should make it easy to search a solr server with an e.g. custom servlet. This way we could leverage all helper classes to connect to the server into the client. What format will be returned depends on the type defined in the query string that is the reason why I do not thing the JDOM stuff makes sense.

Further the different "public Response search" methods lead is IMO not generic enough, why not simply use
public Response search(String _query, List<NameValuePair> params) { ... }
and returning directly the solr response. Then the calling method would need to deal with the raw response.


giri added a comment - 27/Dec/06 08:32 PM
Hi,
I just downloaded this code and compiled it, I assume I need to use the SolrSearcher methods. I tired the following example and getting error

SolrSearcher solrSearcher = new SolrSearcher("http://localhost:8090/solr/select/");
List filedsList = new ArrayList();
filedsList.add("abstract");
filedsList.add("origin");

Response res = solrSearcher.search("water", filedsList, "String");

When I run the above code, I am getting the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:218)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88)
at search.SolrSearcher.search(SolrSearcher.java:104)
at search.SolrSearcher.search(SolrSearcher.java:71)
at search.TestSolr.main(TestSolr.java:21)

Any help is appreciated.

Thanks!


giri added a comment - 27/Dec/06 08:58 PM
It worked, I needed to explicitly list the commons codec jar file on the classpath

Otis Gospodnetic added a comment - 12/Jan/07 11:37 AM
I think this issue is now out of date - it looks like moved everything to SOLR-20. Close this?

Hoss Man added a comment - 14/Jan/07 11:29 PM
officially linking to SOLR-20 since that issue is subsumming this one