Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13132

Improve JSON "terms" facet performance when sorted by relatedness

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.4, 9.0
    • 8.7, 9.0
    • Facet Module
    • None

    Description

      When sorting buckets by relatedness, JSON "terms" facet must calculate relatedness for every term. 

      The current implementation uses a standard uninverted approach (either docValues or UnInvertedField) to get facet counts over the domain base docSet, and then uses that initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for each relevant term (i.e., count > minCount?) and calculating intersection size of those sets with the domain base docSet.

      Over high-cardinality fields, the overhead of per-term docSet creation and set intersection operations increases request latency to the point where relatedness sort may not be usable in practice (for my use case, even after applying the patch for SOLR-13108, for a field with ~220k unique terms per core, QTime for high-cardinality domain docSets were, e.g.: cardinality 1816684=9000ms, cardinality 5032902=18000ms).

      The attached patch brings the above example QTimes down to a manageable ~300ms and ~250ms respectively. The approach calculates uninverted facet counts over domain base, foreground, and background docSets in parallel in a single pass. This allows us to take advantage of the efficiencies built into the standard uninverted FacetFieldProcessorByArray[DV|UIF]), and avoids the per-term docSet creation and set intersection overhead.

      Attachments

        1. SOLR-13132_testSweep.patch
          28 kB
          Chris M. Hostetter
        2. SOLR-13132.patch
          43 kB
          Michael Gibney
        3. SOLR-13132-benchmarks.tgz
          4 kB
          Michael Gibney
        4. SOLR-13132-with-cache.patch
          129 kB
          Michael Gibney
        5. SOLR-13132-with-cache-01.patch
          129 kB
          Michael Gibney

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hossman Chris M. Hostetter
            magibney Michael Gibney
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h

                Slack

                  Issue deployment