Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11831

Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Grouping
    • None

    Description

      In cases where we do grouping and ask for group.limit=1 only it is possible to skip the second grouping step. In our test datasets it improved speed by around 40%.

      Essentially, in the first grouping step each shard returns the top K groups based on the highest scoring document in each group. The top K groups from each shard are merged in the federator and in the second step we ask all the shards to return the top documents from each of the top ranking groups.

      If we only want to return the highest scoring document per group we can return the top document id in the first step, merge results in the federator to retain the top K groups and then skip the second grouping step entirely. This is possible provided that:

      a) We do not need to know the total number of matching documents per group
      b) Within group sort and between group sort is the same.
      c) We are not doing reranking (this is because this is done in the second grouping step. It is also possible to get this to work with reranking but more work and some additional assumptions are required)

      This patch applies the grouping optimisation in cases where a)-c) apply and we are only sorting by relevance. It is also possible to extend this work to handle multiple sorting criteria and also reranking.

      P.S. Diego and I called this patch "las vegas" because we started to write it on the flight to Las Vegas for Lucene/Solr revolution.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mjosephidou Malvina Josephidou
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 17h 50m
                  17h 50m