[SOLR-1875] per-segment single valued string faceting - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-ALPHA, 4.0-BETA, 4.0, 6.0
Component/s: None
Labels:
None

Description

A little stepping stone to NRT:
Per-segment single-valued string faceting using the Lucene FieldCache.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--SOLR-1875.patch
12/Apr/10 00:52
35 kB
Yonik Seeley
ASF.LICENSE.NOT.GRANTED--SOLR-1875.patch
10/Apr/10 19:06
22 kB
Yonik Seeley

Activity

Ascending order - Click to sort in descending order

Yonik Seeley added a comment - 10/Apr/10 14:15

OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache.
Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.

Yonik Seeley added a comment - 10/Apr/10 14:15 OK, so the idea is pretty simple: reuse the existing algorithm for single valued string fields that uses the FieldCache. Count per-segment with a per-segment accumulator array, then merge all of the counts at the end (probably with a priority queue - same method used in MultiTermEnum). Seems like a good opportunity to introduce some threading and do the per-segment counting in parallel.

Yonik Seeley added a comment - 10/Apr/10 19:06

Here's the first cut - seems to work fine.
You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment).

I haven't introduced a way to limit the number of threads used... it's currently one per segment.
I'm thinking of a local param named "threads" for that.

Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).

Yonik Seeley added a comment - 10/Apr/10 19:06 Here's the first cut - seems to work fine. You can try it out with facet.method=fcs (the extra "s" can either stand for the plural, since there are multiple field caches, or for segment). I haven't introduced a way to limit the number of threads used... it's currently one per segment. I'm thinking of a local param named "threads" for that. Note: this will probably only make sense in NRT scenarios. It will take up more memory for the field caches, more memory per-request for the accumulator arrays, and more CPU since an additional merge step is needed. One possible side benefit is a reduction in field cache memory (due to field cache insanity - per-segment and whole-index field caches both being populated).

Yonik Seeley added a comment - 12/Apr/10 00:52

Here's an update:

adds a local param "threads" to optionally control how many threads are dedicated to a facet command
Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.

Yonik Seeley added a comment - 12/Apr/10 00:52 Here's an update: adds a local param "threads" to optionally control how many threads are dedicated to a facet command Reworks the simple facets test... the index is built only once, but shuffled and docs are sometimes duplicated (to test for deletion effects). This also required adding a way to turn off fieldcache sanity checking.

Yonik Seeley added a comment - 28/Apr/10 19:46

I plan on committing this soon.
The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.

Yonik Seeley added a comment - 28/Apr/10 19:46 I plan on committing this soon. The public API is very limited - just a "threads" local param, and "facet.method=fcs". Stuff like that can easily be changed post-commit of course.

Chris M. Hostetter added a comment - 28/May/10 03:27

Correcting Fix Version based on CHANGES.txt, see this thread for more details...

http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

Chris M. Hostetter added a comment - 28/May/10 03:27 Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

Robert Muir added a comment - 30/Jan/11 13:59

Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java)

I noticed the box was not checked.

Robert Muir added a comment - 30/Jan/11 13:59 Yonik: did you intend to add Apache license to this file (e.g. PerSegmentSingleValuedFaceting.java) I noticed the box was not checked.

Chris M. Hostetter added a comment - 11/Jul/12 22:25

bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

Chris M. Hostetter added a comment - 11/Jul/12 22:25 bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

Robert Muir added a comment - 07/Aug/12 03:43

rmuir20120906-bulk-40-change

Robert Muir added a comment - 07/Aug/12 03:43 rmuir20120906-bulk-40-change

Robert Muir added a comment - 10/Sep/12 17:41

moving all 4.0 issues not touched in a month to 4.1

Robert Muir added a comment - 10/Sep/12 17:41 moving all 4.0 issues not touched in a month to 4.1

Erik Hatcher added a comment - 30/Sep/12 23:00

Isn't this fully resolved for 4.0 (and alpha/beta as well)?

Erik Hatcher added a comment - 30/Sep/12 23:00 Isn't this fully resolved for 4.0 (and alpha/beta as well)?

Uwe Schindler added a comment - 10/May/13 10:33

Closed after release.

Uwe Schindler added a comment - 10/May/13 10:33 Closed after release.

People

Assignee:: Yonik Seeley

Reporter:: Yonik Seeley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Apr/10 14:11

Updated:: 09/May/16 18:49

Resolved:: 01/Oct/12 17:20

Solr

Details

Description

Attachments

Attachments

Activity

People

Dates