Solr
  1. Solr
  2. SOLR-8559

FCS facet performance optimization

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: 5.5, 6.0
    • Fix Version/s: 5.5
    • Component/s: faceting
    • Flags:
      Patch

      Description

      While profiling a large collection (multi-sharded billions of documents), I found that a fast (5-10ms query) which had no matches would take 20-30 seconds when doing facets even when facet.mincount=1

      Profiling made it apparent that with facet.method=fcs 99% of the time was spent here.

      queue.udpateTop gets called numOfSegments*numTerms, the worst case when every term is in every segment. This formula doesn't take into account whether or not any of the terms have a positive count with respect to the docset.

      These optimizations are aimed to do two things:

      1. When mincount>0 don't include segments which all terms have zero counts. This should significantly speed up processing when terms are high cardinality and the matching docset is small
      2. FIXED TODO optimization: when mincount>0 move segment position the next non zero term value.

      both of these changes will minimize the number of called needed to the slow updateTop call.

      1. solr-8559.patch
        3 kB
        Keith Laban
      2. SOLR-8559.patch
        3 kB
        Dennis Gove
      3. SOLR-8559-4-10-4.patch
        3 kB
        Keith Laban

        Activity

        Hide
        Dennis Gove added a comment - - edited

        Are you able to create a test for this specific enhancement? Or if not, are there existing tests covering this code I can specifically check after applying the patch?

        Show
        Dennis Gove added a comment - - edited Are you able to create a test for this specific enhancement? Or if not, are there existing tests covering this code I can specifically check after applying the patch?
        Hide
        Dennis Gove added a comment -

        Rebased off trunk. Keith will upload a 5x backport.

        Show
        Dennis Gove added a comment - Rebased off trunk. Keith will upload a 5x backport.
        Hide
        Keith Laban added a comment -

        I have not written a specific test, but running TestRandomDVFaceting with a coverage tool shows complete coverage for code in this patch

        Show
        Keith Laban added a comment - I have not written a specific test, but running TestRandomDVFaceting with a coverage tool shows complete coverage for code in this patch
        Hide
        Keith Laban added a comment -

        trunk patch applies cleanly to branch_5x

        Show
        Keith Laban added a comment - trunk patch applies cleanly to branch_5x
        Hide
        ASF subversion and git services added a comment -

        Commit 1725638 from dpgove@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1725638 ]

        SOLR-8559: FCS facet performance optimization

        Significantly speeds up processing when terms are high cardinality and the matching docset is small.
        When facet minCount > 0 and the number of matching documents is small (or 0) this enhancement
        prevents considering terms which have a 0 count. Also includes change to move to the next non-zero
        term value when selecting a segment position.

        Show
        ASF subversion and git services added a comment - Commit 1725638 from dpgove@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1725638 ] SOLR-8559 : FCS facet performance optimization Significantly speeds up processing when terms are high cardinality and the matching docset is small. When facet minCount > 0 and the number of matching documents is small (or 0) this enhancement prevents considering terms which have a 0 count. Also includes change to move to the next non-zero term value when selecting a segment position.
        Hide
        ASF subversion and git services added a comment -

        Commit 1725639 from dpgove@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1725639 ]

        SOLR-8559: FCS facet performance optimization

        Significantly speeds up processing when terms are high cardinality and the matching docset is small.
        When facet minCount > 0 and the number of matching documents is small (or 0) this enhancement
        prevents considering terms which have a 0 count. Also includes change to move to the next non-zero
        term value when selecting a segment position.

        Show
        ASF subversion and git services added a comment - Commit 1725639 from dpgove@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1725639 ] SOLR-8559 : FCS facet performance optimization Significantly speeds up processing when terms are high cardinality and the matching docset is small. When facet minCount > 0 and the number of matching documents is small (or 0) this enhancement prevents considering terms which have a 0 count. Also includes change to move to the next non-zero term value when selecting a segment position.
        Hide
        Dennis Gove added a comment -

        Patch applied to both trunk and branch_5x.

        Show
        Dennis Gove added a comment - Patch applied to both trunk and branch_5x.
        Hide
        Keith Laban added a comment -

        added 4.10.4 patch as well

        Show
        Keith Laban added a comment - added 4.10.4 patch as well
        Hide
        Dennis Gove added a comment -

        Thanks for this performance optimization, Keith!

        Show
        Dennis Gove added a comment - Thanks for this performance optimization, Keith!
        Hide
        Keith Laban added a comment -

        After a doing some performance testing, this change shows about 4 orders of magnitudes speed improvement.

        For a collection with 100M documents a query matching 2,675 documents and 1,964 unique terms (facet.mincount=1&facet.limit=-1&facet.method=fcs) sped up query performance from 11664ms (11 seconds) to 30ms.

        Show
        Keith Laban added a comment - After a doing some performance testing, this change shows about 4 orders of magnitudes speed improvement. For a collection with 100M documents a query matching 2,675 documents and 1,964 unique terms (facet.mincount=1&facet.limit=-1&facet.method=fcs) sped up query performance from 11664ms (11 seconds) to 30ms.
        Hide
        David Smiley added a comment -

        Dennis Gove Please update the "Fix Version/s:" when resolving issues to the particular non-trunk release this is fixed on. 5.5? It's multi-valued; and including trunk is generally implied, I usually omit that. Also, I forget where I saw this but I think we're always supposed to mark issues as "Resolved" (not Closed) and then at the next release it gets bulk-updated to Closed by the RM.

        And thanks for getting this in

        Show
        David Smiley added a comment - Dennis Gove Please update the "Fix Version/s:" when resolving issues to the particular non-trunk release this is fixed on. 5.5? It's multi-valued; and including trunk is generally implied, I usually omit that. Also, I forget where I saw this but I think we're always supposed to mark issues as "Resolved" (not Closed) and then at the next release it gets bulk-updated to Closed by the RM. And thanks for getting this in
        Hide
        Dennis Gove added a comment -

        Thanks, Dave. I think I've been marking issues as closed. I'll keep this in mind going forward.

        Show
        Dennis Gove added a comment - Thanks, Dave. I think I've been marking issues as closed. I'll keep this in mind going forward.

          People

          • Assignee:
            Dennis Gove
            Reporter:
            Keith Laban
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development