Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5333

Support sparse faceting for heterogeneous indices

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In some search apps, e.g. a large e-commerce site, the index can have
      a mix of wildly different product categories and facet dimensions, and
      the number of dimensions could be huge.

      E.g. maybe the index has shirts, computer memory, hard drives, etc.,
      and each of these many categories has different attributes.

      In such an index, when someone searches for "so dimm", which should
      match a bunch of laptop memory modules, you can't (easily) know up
      front which facet dimensions will be important.

      But, I think this is very easy for the facet module, since ords are
      stored "row stride" (each doc lists all facet labels it has), we could
      simply count all facets that the hits actually saw, and then in the
      end see which ones "got traction" and return facet results for these
      top dims.

      I'm not sure what the API would look like, but conceptually this
      should work very well, because of how the facet module works.
      You shouldn't have to state up front exactly which facet dimensions
      to count...

        Attachments

        1. LUCENE-5333.patch
          18 kB
          Shai Erera
        2. LUCENE-5333.patch
          13 kB
          Shai Erera
        3. LUCENE-5333.patch
          11 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: