Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5244

Exporting Full Sorted Result Sets

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 4.10, 6.0
    • Component/s: search
    • Labels:
      None

      Description

      This ticket allows Solr to export full sorted result sets. A new export request handler has been created that sets up the default writer type (SortingResponseWriter) and the required rank query (ExportQParserPlugin). The syntax is:

      /solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
      

      This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example:

      Large Distributed Joins

      A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join.

      Fully Distributed Field Collapsing/Grouping

      A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse.

      High Cardinality Distributed Aggregation

      A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation.

      Large Scale Time Series Rollups

      A client outside Solr makes individual calls to all servers in a collection and sorts on time dimensions. The client merge joins the sorted result sets and rolls up the time dimensions as it iterates through the data.

      In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish.

      Session Analysis and Aggregation

      A client outside Solr makes individual calls to all servers in a collection and sorts on the sessionID. The client merge joins the sorted results and aggregates sessions as it iterates through the results.

        Attachments

        1. SOLR-5244.patch
          61 kB
          Joel Bernstein
        2. SOLR-5244.patch
          61 kB
          Joel Bernstein
        3. SOLR-5244.patch
          59 kB
          Joel Bernstein
        4. SOLR-5244.patch
          28 kB
          Joel Bernstein
        5. SOLR-5244.patch
          26 kB
          Joel Bernstein
        6. SOLR-5244.patch
          15 kB
          Joel Bernstein
        7. 0001-SOLR_5244.patch
          15 kB
          Lianyi Han
        8. SOLR-5244.patch
          14 kB
          Joel Bernstein

          Activity

            People

            • Assignee:
              joel.bernstein Joel Bernstein
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              12 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: