Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0-ALPHA
-
None
-
None
-
any
Description
It would be useful to have Solr return per-document results instead of a generic SolrException when multiple documents are being passed via CommonsHttpSolrServer.The API supports adding multiple streams/files to a request (see SOLR-3010 for an example usage in jython) but when an error is detected, an exception is returned to the caller and the caller must then determine which document failed to be processed. This is particularly problematic for simple document extraction when using solr and tika to pre-process documents for indexing. In this case, a batch of documents is passed to solr for processing by tika. If any of the documents fails to be processed, a SolrException is thrown:
Mon Jan 9 18:04:50 2012 Caught SolrException handling documents [13356414, 23590833, 33917483] (<jclass org.apache.solr.common.SolrException 9>, org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.TNEFParser@6d893ae8 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: TIK
Instead of this exception, the API could be configured to return a response that has a result per-document indicating the server's response for processing of the batch. A caller could then check the response and extract the relevant parsed content for successful documents and do special handling for documents that failed to be parsed.
There are reasonable workarounds for this in the current product. First, callers can pass 1 document at a time for processing and then there is no ambiguity on what the result is for a document. Another approach is to pass a small batch of documents to Solr/Tika and if an exception is thrown, reprocess the documents one at a time. If the corpus of documents is largely well-behaved, minimal retries will be needed to reprocess failures.
Attachments
Issue Links
- relates to
-
SOLR-3382 Finegrained error propagation (focus on multi-document updates)
- Open