[LUCENE-2309] Fully decouple IndexWriter from analyzers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-ALPHA
Component/s: core/index
Labels:

Lucene Fields:

New

Description

IndexWriter only needs an AttributeSource to do indexing.

Yet, today, it interacts with Field instances, holds a private
analyzers, invokes analyzer.reusableTokenStream, has to deal with a
wide variety (it's not analyzed; it is analyzed but it's a Reader,
String; it's pre-analyzed).

I'd like to have IW only interact with attr sources that already
arrived with the fields. This would be a powerful decoupling – it
means others are free to make their own attr sources.

They need not even use any of Lucene's analysis impls; eg they can
integrate to other things like OpenPipeline.
Or make something completely custom.

~~LUCENE-2302~~ is already a big step towards this: it makes IW agnostic
about which attr is "the term", and only requires that it provide a
BytesRef (for flex).

Then I think ~~LUCENE-2308~~ would get us most of the remaining way – ie, if the
FieldType knows the analyzer to use, then we could simply create a
getAttrSource() method (say) on it and move all the logic IW has today
onto there. (We'd still need existing IW code for back-compat).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2309-getTSFromField.patch
22/Sep/11 08:47
19 kB
Chris Male
LUCENE-2309-analyzer-based.patch
18/Jul/11 10:14
13 kB
Chris Male
LUCENE-2309.patch
17/Jul/11 13:50
14 kB
Chris Male

Activity

People

Assignee:: Chris Male

Reporter:: Michael McCandless

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Mar/10 20:34

Updated:: 28/Aug/22 12:21

Resolved:: 23/Sep/11 16:23