Solr
  1. Solr
  2. SOLR-1875

per-segment single valued string faceting

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA, 4.0-BETA, 4.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      A little stepping stone to NRT:
      Per-segment single-valued string faceting using the Lucene FieldCache.

        Activity

        Hide
        Yonik Seeley added a comment -

        OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache.
        Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.

        Show
        Yonik Seeley added a comment - OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache. Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.
        Hide
        Yonik Seeley added a comment -

        Here's the first cut - seems to work fine.
        You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment).

        I haven't introduced a way to limit the number of threads used... it's currently one per segment.
        I'm thinking of a local param named "threads" for that.

        Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).

        Show
        Yonik Seeley added a comment - Here's the first cut - seems to work fine. You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment). I haven't introduced a way to limit the number of threads used... it's currently one per segment. I'm thinking of a local param named "threads" for that. Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).
        Hide
        Yonik Seeley added a comment -

        Here's an update:

        • adds a local param "threads" to optionally control how many threads are dedicated to a facet command
        • Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.
        Show
        Yonik Seeley added a comment - Here's an update: adds a local param "threads" to optionally control how many threads are dedicated to a facet command Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.
        Hide
        Yonik Seeley added a comment -

        I plan on committing this soon.
        The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.

        Show
        Yonik Seeley added a comment - I plan on committing this soon. The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.
        Hide
        Hoss Man added a comment -

        Correcting Fix Version based on CHANGES.txt, see this thread for more details...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Show
        Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
        Hide
        Robert Muir added a comment -

        Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java)

        I noticed the box was not checked.

        Show
        Robert Muir added a comment - Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java) I noticed the box was not checked.
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Erik Hatcher added a comment -

        Isn't this fully resolved for 4.0 (and alpha/beta as well)?

        Show
        Erik Hatcher added a comment - Isn't this fully resolved for 4.0 (and alpha/beta as well)?
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development