Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10634

Move calculation of some aggregations to collection phase

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.7, 7.0
    • Facet Module
    • None

    Description

      From http://markmail.org/message/pwgnt7iqxkzcnckh

      The current code is more optimized for finding the top K buckets from
      a total of N.
      When one asks to return the top 10 buckets when there are potentially
      millions of buckets, it makes sense to defer calculating other metrics
      for those buckets until we know which ones they are. After we
      identify the top 10 buckets, we calculate the domain for that bucket
      and use that to calculate the remaining metrics.

      The current method is obviously much slower when one is requesting
      all buckets. We might as well just calculate all metrics in the
      first pass rather than trying to defer them.

      So we should move aggregations from the second pass to the first pass under the following conditions:

      • no limit (or a high limit compared to the number of potential buckets?)
      • no sub-facets (or anything else) that will need the domain calculated anyway
      • aggregation is not really memory intensive per-slot (i.e. moving some calculations from per-bucket in the second phase, to all-buckets-in-parallel in the first phase could be really bad for peak memory usage)

      Attachments

        1. SOLR-10634.patch
          9 kB
          Yonik Seeley
        2. SOLR-10634.patch
          3 kB
          Yonik Seeley

        Activity

          People

            Unassigned Unassigned
            yseeley@gmail.com Yonik Seeley
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: