Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1875

per-segment single valued string faceting

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0-ALPHA, 4.0-BETA, 4.0, 6.0
    • None
    • None

    Description

      A little stepping stone to NRT:
      Per-segment single-valued string faceting using the Lucene FieldCache.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--SOLR-1875.patch
          22 kB
          Yonik Seeley
        2. ASF.LICENSE.NOT.GRANTED--SOLR-1875.patch
          35 kB
          Yonik Seeley

        Activity

          yseeley@gmail.com Yonik Seeley added a comment -

          OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache.
          Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.

          yseeley@gmail.com Yonik Seeley added a comment - OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache. Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.
          yseeley@gmail.com Yonik Seeley added a comment -

          Here's the first cut - seems to work fine.
          You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment).

          I haven't introduced a way to limit the number of threads used... it's currently one per segment.
          I'm thinking of a local param named "threads" for that.

          Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).

          yseeley@gmail.com Yonik Seeley added a comment - Here's the first cut - seems to work fine. You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment). I haven't introduced a way to limit the number of threads used... it's currently one per segment. I'm thinking of a local param named "threads" for that. Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).
          yseeley@gmail.com Yonik Seeley added a comment -

          Here's an update:

          • adds a local param "threads" to optionally control how many threads are dedicated to a facet command
          • Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.
          yseeley@gmail.com Yonik Seeley added a comment - Here's an update: adds a local param "threads" to optionally control how many threads are dedicated to a facet command Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.
          yseeley@gmail.com Yonik Seeley added a comment -

          I plan on committing this soon.
          The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.

          yseeley@gmail.com Yonik Seeley added a comment - I plan on committing this soon. The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.
          hossman Chris M. Hostetter added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          rcmuir Robert Muir added a comment -

          Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java)

          I noticed the box was not checked.

          rcmuir Robert Muir added a comment - Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java) I noticed the box was not checked.

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          hossman Chris M. Hostetter added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          rcmuir Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          rcmuir Robert Muir added a comment - rmuir20120906-bulk-40-change
          rcmuir Robert Muir added a comment -

          moving all 4.0 issues not touched in a month to 4.1

          rcmuir Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
          ehatcher Erik Hatcher added a comment -

          Isn't this fully resolved for 4.0 (and alpha/beta as well)?

          ehatcher Erik Hatcher added a comment - Isn't this fully resolved for 4.0 (and alpha/beta as well)?
          uschindler Uwe Schindler added a comment -

          Closed after release.

          uschindler Uwe Schindler added a comment - Closed after release.

          People

            yseeley@gmail.com Yonik Seeley
            yseeley@gmail.com Yonik Seeley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: