Lucene - Core
  1. Lucene - Core
  2. LUCENE-5371

Range faceting should use O(log(N)) search per hit

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, Lucene's dynamic range faceting uses a simple linear search to
      find which ranges match, but there are known data structures to do
      this in log(N) time. I played with segment trees and wrote up a blog
      post here:

      http://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html

      O(N) cost is actually OK when number of ranges is smallish, which is
      typical for facet use cases, but then scales badly if there are many
      ranges.

      1. LUCENE-5371.patch
        35 kB
        Michael McCandless
      2. LUCENE-5371.patch
        31 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch (applies to the LUCENE-5339 branch), implementing a simple
        segment tree solution. The per-hit increment is a binary search,
        O(log(N)).

        This only handles longs right now; I still need to do doubles. Maybe
        I can convert all doubles to longs and then reuse the LongRangeCounter
        class for doubles...

        I didn't use ASM; just "normal" java sources.

        Show
        Michael McCandless added a comment - Patch (applies to the LUCENE-5339 branch), implementing a simple segment tree solution. The per-hit increment is a binary search, O(log(N)). This only handles longs right now; I still need to do doubles. Maybe I can convert all doubles to longs and then reuse the LongRangeCounter class for doubles... I didn't use ASM; just "normal" java sources.
        Hide
        Michael McCandless added a comment -

        New patch, cutting over DoubleRangeFacetCounts too.

        I'll commit this to the branch and we can iterate there...

        Show
        Michael McCandless added a comment - New patch, cutting over DoubleRangeFacetCounts too. I'll commit this to the branch and we can iterate there...
        Hide
        ASF subversion and git services added a comment -

        Commit 1552086 from Michael McCandless in branch 'dev/branches/lucene5339'
        [ https://svn.apache.org/r1552086 ]

        LUCENE-5371: faster range faceting using segment trees

        Show
        ASF subversion and git services added a comment - Commit 1552086 from Michael McCandless in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1552086 ] LUCENE-5371 : faster range faceting using segment trees
        Hide
        ASF subversion and git services added a comment -

        Commit 1555338 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1555338 ]

        LUCENE-5371, LUCENE-5339: speed up range faceting from O(N) per hit to O(log(N)) using segment trees; simplify facet APIs

        Show
        ASF subversion and git services added a comment - Commit 1555338 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1555338 ] LUCENE-5371 , LUCENE-5339 : speed up range faceting from O(N) per hit to O(log(N)) using segment trees; simplify facet APIs
        Hide
        ASF subversion and git services added a comment -

        Commit 1555342 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1555342 ]

        LUCENE-5371, LUCENE-5339: speed up range faceting from O(N) per hit to O(log(N)) using segment trees; simplify facet APIs

        Show
        ASF subversion and git services added a comment - Commit 1555342 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1555342 ] LUCENE-5371 , LUCENE-5339 : speed up range faceting from O(N) per hit to O(log(N)) using segment trees; simplify facet APIs

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development