Description
This ticket allows Solr to export full sorted result sets. A new export request handler has been created that sets up the default writer type (SortingResponseWriter) and the required rank query (ExportQParserPlugin). The syntax is:
/solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example:
Large Distributed Joins
A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join.
Fully Distributed Field Collapsing/Grouping
A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse.
High Cardinality Distributed Aggregation
A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation.
Large Scale Time Series Rollups
A client outside Solr makes individual calls to all servers in a collection and sorts on time dimensions. The client merge joins the sorted result sets and rolls up the time dimensions as it iterates through the data.
In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish.
Session Analysis and Aggregation
A client outside Solr makes individual calls to all servers in a collection and sorts on the sessionID. The client merge joins the sorted results and aggregates sessions as it iterates through the results.