Solr
  1. Solr
  2. SOLR-2092

Use a native priority queue to order facet results

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Deep paging into facets can be improved via use of a native priority queue.
      Comparisons will be faster, and there will be less GC.

      1. SOLR-2092.patch
        18 kB
        Yonik Seeley
      2. SOLR-2092.patch
        15 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Here's a draft patch with only UnInvertedField converted from using BoundedTreeSet to LongPriorityQueue so far. I haven't performance tested this yet, but I imagine ord->term lookup will remain the dominant cost of large facet.limit objects. Still, this should help cut down the maximum memory footprint too.

        I also managed to reuse the internal array from the priority queue (so now both arrays used to sort term ords are reused from previously allocated arrays).

        Show
        Yonik Seeley added a comment - Here's a draft patch with only UnInvertedField converted from using BoundedTreeSet to LongPriorityQueue so far. I haven't performance tested this yet, but I imagine ord->term lookup will remain the dominant cost of large facet.limit objects. Still, this should help cut down the maximum memory footprint too. I also managed to reuse the internal array from the priority queue (so now both arrays used to sort term ords are reused from previously allocated arrays).
        Hide
        Yonik Seeley added a comment -

        Here's an update that adds the long pq to the single-valued fc method.
        It also uses the missing count from calculated like all other counts (from the field cache).

        Show
        Yonik Seeley added a comment - Here's an update that adds the long pq to the single-valued fc method. It also uses the missing count from calculated like all other counts (from the field cache).
        Hide
        Yonik Seeley added a comment -

        Performance results:

        Faceting on a single valued field with 100K unique values.
        Base doc set size = 100K docs.
        facet.limit=10
        8 request threads (CPU = 4 core Phenom II)
        Average response time was measured externally (i.e. includes all request parsing, response writing, etc)

        facet.offset ms trunk ms patch
        0 18 17
        100 18 17
        1000 23 17
        10000 70 20
        100000 211 26

        So as expected, most of the performance benefit is when you are paging deep, but there are also slight improvements across the board.

        I plan on committing soon.

        Show
        Yonik Seeley added a comment - Performance results: Faceting on a single valued field with 100K unique values. Base doc set size = 100K docs. facet.limit=10 8 request threads (CPU = 4 core Phenom II) Average response time was measured externally (i.e. includes all request parsing, response writing, etc) facet.offset ms trunk ms patch 0 18 17 100 18 17 1000 23 17 10000 70 20 100000 211 26 So as expected, most of the performance benefit is when you are paging deep, but there are also slight improvements across the board. I plan on committing soon.

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development