[LUCENE-7304] Doc values based block join implementation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

At query time the block join relies on a bitset for finding the previous parent doc during advancing the doc id iterator. On large indices these bitsets can consume large amounts of jvm heap space. Also typically due the nature how these bitsets are set, the 'FixedBitSet' implementation is used.

The idea I had was to replace the bitset usage by a numeric doc values field that stores offsets. Each child doc stores how many docids it is from its parent doc and each parent stores how many docids it is apart from its first child. At query time this information can be used to perform the block join.

I think another benefit of this approach is that external tools can now easily determine if a doc is part of a block of documents and perhaps this also helps index time sorting?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7304-20160606.patch
06/Jun/16 20:03
110 kB
Paul Elschot
LUCENE-7304-20160531.patch
31/May/16 20:44
10 kB
Paul Elschot
LUCENE-7304.patch
24/May/17 13:22
25 kB
Martijn van Groningen
LUCENE-7304.patch
22/Jun/17 12:05
25 kB
Martijn van Groningen
LUCENE-5092-20140313.patch
30/May/16 18:14
25 kB
Paul Elschot
LUCENE_7304.patch
26/May/16 14:05
17 kB
Martijn van Groningen
LUCENE_7304.patch
07/Jun/16 13:19
22 kB
Martijn van Groningen

Activity

People

Assignee:: Unassigned

Reporter:: Martijn van Groningen

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/May/16 14:02

Updated:: 28/Aug/22 14:58