[LUCENE-4771] Query-time join collectors could maybe be more efficient - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: modules/join
Labels:
None

Lucene Fields:

New

Description

I was looking @ these collectors on ~~LUCENE-4765~~ and I noticed:

SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect.
MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect.

I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd].

When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash.

Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4771_prototype_without_bug.patch
11/Feb/13 20:10
15 kB
Robert Muir
LUCENE-4771_prototype.patch
11/Feb/13 20:03
15 kB
Robert Muir
LUCENE-4771-prototype.patch
25/Feb/13 12:56
18 kB
Martijn van Groningen

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Feb/13 16:36

Updated:: 28/Aug/22 13:39