[LUCENE-2878] Allow Scorer to expose positions and payloads aka. nuke spans - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Implemented
Affects Version/s: Positions Branch
Fix Version/s: 7.4, 8.0
Component/s: core/search
Labels:
- gsoc2014

Lucene Fields:

New, Patch Available

Description

Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that.
Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead.
To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec.

So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute.

I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first .

The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PosHighlighter.patch
30/Jun/11 03:27
21 kB
Michael Sokolov
PosHighlighter.patch
02/Jul/11 16:49
22 kB
Michael Sokolov
LUCENE-2878-vs-trunk.patch
28/Oct/12 10:59
395 kB
Simon Willnauer
LUCENE-2878-OR.patch
02/Jul/11 16:44
125 kB
Michael Sokolov
LUCENE-2878.patch
21/Jan/11 16:57
101 kB
Simon Willnauer
LUCENE-2878.patch
24/Jan/11 17:45
45 kB
Simon Willnauer
LUCENE-2878.patch
27/Jan/11 15:24
73 kB
Simon Willnauer
LUCENE-2878.patch
31/Jan/11 16:48
119 kB
Simon Willnauer
LUCENE-2878.patch
09/Jul/11 17:15
29 kB
Michael Sokolov
LUCENE-2878.patch
10/Jul/11 21:38
49 kB
Michael Sokolov
LUCENE-2878.patch
12/Jul/11 13:51
84 kB
Simon Willnauer
LUCENE-2878.patch
14/Jul/11 02:34
115 kB
Michael Sokolov
LUCENE-2878.patch
24/May/12 13:22
54 kB
Alan Woodward
LUCENE-2878.patch
24/May/12 21:31
17 kB
Alan Woodward
LUCENE-2878.patch
25/May/12 08:21
36 kB
Simon Willnauer
LUCENE-2878.patch
25/May/12 08:29
41 kB
Simon Willnauer
LUCENE-2878.patch
25/May/12 18:56
5 kB
Alan Woodward
LUCENE-2878.patch
28/May/12 07:50
11 kB
Alan Woodward
LUCENE-2878.patch
28/May/12 09:55
13 kB
Simon Willnauer
LUCENE-2878.patch
13/Jun/12 13:29
23 kB
Alan Woodward
LUCENE-2878.patch
13/Jun/12 15:50
34 kB
Alan Woodward
LUCENE-2878.patch
14/Jun/12 16:26
37 kB
Alan Woodward
LUCENE-2878.patch
06/Jul/12 15:02
9.49 MB
Alan Woodward
LUCENE-2878.patch
18/Jul/12 20:42
11 kB
Alan Woodward
LUCENE-2878.patch
24/Oct/12 15:59
27 kB
Alan Woodward
LUCENE-2878.patch
26/Oct/12 13:38
14 kB
Alan Woodward
LUCENE-2878.patch
28/Oct/12 21:38
3 kB
Michael McCandless
LUCENE-2878.patch
04/Nov/12 22:51
9 kB
Alan Woodward
LUCENE-2878.patch
24/May/14 22:01
1.14 MB
Alan Woodward
LUCENE-2878.patch
29/May/14 09:08
1.15 MB
Alan Woodward
LUCENE-2878.patch
18/Sep/14 10:44
623 kB
Alan Woodward
LUCENE-2878.patch
05/Dec/14 15:47
690 kB
Alan Woodward
LUCENE-2878.patch
08/Dec/14 16:13
655 kB
Alan Woodward
LUCENE-2878.patch
16/Dec/14 11:39
646 kB
Alan Woodward
LUCENE-2878_trunk.patch
13/Jun/11 16:11
119 kB
Simon Willnauer
LUCENE-2878_trunk.patch
14/Jun/11 07:06
120 kB
Simon Willnauer

Issue Links

relates to

LUCENE-4524 Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

Closed

Sub-Tasks

1.	Explore Proximity Scoring		Open	Unassigned
2.	Intervals don't record field information		Open	Unassigned

Allow Scorer to expose positions and payloads aka. nuke spans

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates