[LUCENE-1333] Token implementation needs improvements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3.1
Fix Version/s: 2.4
Component/s: modules/analysis
Labels:
None
Environment:

All

Lucene Fields:

New

Description

This was discussed in the thread (not sure which place is best to reference so here are two):
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/%3C21F67CC2-EBB4-48A0-894E-FBA4AECC0D50@gmail.com%3E
or to see it all at once:
http://www.gossamer-threads.com/lists/lucene/java-dev/62851

Issues:
1. JavaDoc is insufficient, leading one to read the code to figure out how to use the class.
2. Deprecations are incomplete. The constructors that take String as an argument and the methods that take and/or return String should all be deprecated.
3. The allocation policy is too aggressive. With large tokens the resulting buffer can be over-allocated. A less aggressive algorithm would be better. In the thread, the Python example is good as it is computationally simple.
4. The parts of the code that currently use Token's deprecated methods can be upgraded now rather than waiting for 3.0. As it stands, filter chains that alternate between char[] and String are sub-optimal. Currently, it is used in core by Query classes. The rest are in contrib, mostly in analyzers.
5. Some internal optimizations can be done with regard to char[] allocation.
6. TokenStream has next() and next(Token), next() should be deprecated, so that reuse is maximized and descendant classes should be rewritten to over-ride next(Token)
7. Tokens are often stored as a String in a Term. It would be good to add constructors that took a Token. This would simplify the use of the two together.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1333.patch
18/Aug/08 16:12
415 kB
Michael McCandless
LUCENE-1333.patch
18/Aug/08 15:04
415 kB
DM Smith
LUCENE-1333.patch
12/Aug/08 13:44
343 kB
Michael McCandless
LUCENE-1333.patch
12/Aug/08 11:15
343 kB
Michael McCandless
LUCENE-1333.patch
11/Aug/08 11:44
341 kB
Michael McCandless
LUCENE-1333.patch
10/Aug/08 14:50
327 kB
Michael McCandless
LUCENE-1333.patch
08/Aug/08 21:03
292 kB
DM Smith
LUCENE-1333-xml-query-parser.patch
04/Aug/08 20:13
4 kB
DM Smith
LUCENE-1333-wordnet.patch
04/Aug/08 20:12
4 kB
DM Smith
LUCENE-1333-wikipedia.patch
04/Aug/08 20:12
38 kB
DM Smith
LUCENE-1333-queries.patch
04/Aug/08 20:12
5 kB
DM Smith
LUCENE-1333-miscellaneous.patch
04/Aug/08 20:11
11 kB
DM Smith
LUCENE-1333-memory.patch
04/Aug/08 20:11
11 kB
DM Smith
LUCENE-1333-lucli.patch
04/Aug/08 20:10
1 kB
DM Smith
LUCENE-1333-instantiated.patch
04/Aug/08 20:10
6 kB
DM Smith
LUCENE-1333-highlighter.patch
04/Aug/08 20:09
10 kB
DM Smith
LUCENE-1333-snowball.patch
04/Aug/08 20:09
4 kB
DM Smith
LUCENE-1333-analyzers.patch
04/Aug/08 20:08
111 kB
DM Smith
LUCENE-1333-core.patch
04/Aug/08 20:07
23 kB
DM Smith
LUCENE-1333-analysis.patch
04/Aug/08 20:07
32 kB
DM Smith
LUCENE-1333.patch
04/Aug/08 20:01
25 kB
DM Smith
LUCENE-1333.patch
30/Jul/08 15:30
19 kB
Michael McCandless
LUCENE-1333a.txt
14/Jul/08 21:28
19 kB
DM Smith

Issue Links

incorporates

LUCENE-1350 Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Closed

Activity

People

Assignee:: Michael McCandless

Reporter:: DM Smith

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Jul/08 17:19

Updated:: 28/Aug/22 11:51

Resolved:: 20/Aug/08 14:40