Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5725

Efficient facets without counts for enum method

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.3, 7.0
    • Component/s: search
    • Labels:
      None

      Description

      UPD: Specification

      To cap facet counts by 1 specify facet.exists=true. It can be used with facet.method=enum or when it's omitted. It can be used only on non-trie fields i.e. strings. It may speed up facet counting on large indices and/or high-cardinality facet values..

      Shot version:

      This improves performance for facet.method=enum when it's enough to know that facet count>0, for example when you it's when you dynamically populate filters on search form. New method checks if two bitsets intersect instead of counting intersection size.

      Long version:

      We have a dataset containing hundreds of millions of records, we facet by dozens of fields with many of facet-excludes and have relatively small number of unique values in fields, around thousands.
      Before executing search, users work with "advanced search" form, our goal is to populate dozens of filters with values which are applicable with other selected values, so basically this is a use case for facets with mincount=1, but without need in actual counts.

      Our performance tests showed that facet.method=enum works much better than fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". For example average execution of query time with method fc=1500ms, fcs=2600ms and with enum=280ms. Profiling indicated the majority time for enum was spent on intersecting docsets.

      Hers's a patch that introduces an extension to facet calculation for method=enum. Basically it uses docSetA.intersects(docSetB) instead of docSetA. intersectionSize (docSetB).

      As a result we were able to reduce our average query time from 280ms to 60ms.

        Attachments

        1. facet.limit=0&facet.missing=true discrepancy between cloud and non-distr.txt
          4 kB
          Mikhail Khludnev
        2. SOLR-5725.patch
          45 kB
          Mikhail Khludnev
        3. SOLR-5725.patch
          45 kB
          Mikhail Khludnev
        4. SOLR-5725.patch
          56 kB
          Mikhail Khludnev
        5. SOLR-5725.patch
          56 kB
          Mikhail Khludnev
        6. SOLR-5725.patch
          48 kB
          Mikhail Khludnev
        7. SOLR-5725.patch
          47 kB
          Mikhail Khludnev
        8. SOLR-5725.patch
          3 kB
          Alexey Kozhemiakin
        9. SOLR-5725-5x.patch
          18 kB
          Sebastian Koziel
        10. SOLR-5725-master.patch
          120 kB
          Radoslaw Zielinski

          Issue Links

            Activity

              People

              • Assignee:
                mkhludnev Mikhail Khludnev
                Reporter:
                alexey_kozhemiakin Alexey Kozhemiakin
              • Votes:
                3 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: