Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results.
This isn't expected to be particularly efficient for high cardinality streaming expression results, but it would be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search.
Example:
Docs:
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/update?commit=true --data-binary ' [ {"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}, {"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]}, {"id": "3", "name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]}, {"id": "4", "name_s":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}, {"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]}, {"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]}, {"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]}, {"id": "8", "name_s":"cheese bread sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]}, {"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]}, {"id": "10", "name_s":"cinnamon bread sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]} ]
Query:
http://localhost:8983/solr/food/select?q=*:*&fq=\{!streaming_expression}top(select(search(food,%20q=%22*:*%22,%20fl=%22id,vector_fs%22,%20sort=%22id%20asc%22),%20cosineSimilarity(vector_fs,%20array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0))%20as%20cos,%20id),%20n=5,%20sort=%22cos%20desc%22)&fl=id,name_s
Response:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":7, "params":{ "q":"*:*", "fl":"id,name_s", "fq":"{!streaming_expression}top(select(search(food, q=\"*:*\", fl=\"id,vector_fs\", sort=\"id asc\"), cosineSimilarity(vector_fs, array(5.2,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort=\"cos desc\")"}}, "response":{"numFound":5,"start":0,"docs":[ { "name_s":"donut", "id":"1"}, { "name_s":"apple juice", "id":"2"}, { "name_s":"cheese pizza", "id":"4"}, { "name_s":"cheese bread sticks", "id":"8"}, { "name_s":"cinnamon bread sticks", "id":"10"}] }}
The current implementation also supports the following additional parameters:
f: (optional) The field name from the streaming expression containing the document ids upon which to filter. Defaults to the same uniqueKey field name from your documents.
method: (optional) Any of termsFilter (default), booleanQuery, automaton, docValuesTermsFilter.
The method may go away, especially if we find a more efficient way to join the stream to the main query doc set.
Attachments
Issue Links
- links to