Issue Details (XML | Word | Printable)

Key: LUCENE-1333
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Michael McCandless
Reporter: DM Smith
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Token implementation needs improvements

Created: 11/Jul/08 05:19 PM   Updated: 11/Oct/08 12:49 PM
Return to search
Component/s: Analysis
Affects Version/s: 2.3.1
Fix Version/s: 2.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works LUCENE-1333-analysis.patch 2008-08-04 08:07 PM DM Smith 32 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-analyzers.patch 2008-08-04 08:08 PM DM Smith 111 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-core.patch 2008-08-04 08:07 PM DM Smith 23 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-highlighter.patch 2008-08-04 08:09 PM DM Smith 10 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-instantiated.patch 2008-08-04 08:10 PM DM Smith 6 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-lucli.patch 2008-08-04 08:10 PM DM Smith 1 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-memory.patch 2008-08-04 08:11 PM DM Smith 11 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-miscellaneous.patch 2008-08-04 08:11 PM DM Smith 11 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-queries.patch 2008-08-04 08:12 PM DM Smith 5 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-snowball.patch 2008-08-04 08:09 PM DM Smith 4 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-wikipedia.patch 2008-08-04 08:12 PM DM Smith 38 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-wordnet.patch 2008-08-04 08:12 PM DM Smith 4 kB
Text File Licensed for inclusion in ASF works LUCENE-1333-xml-query-parser.patch 2008-08-04 08:13 PM DM Smith 4 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-18 04:12 PM Michael McCandless 415 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-18 03:04 PM DM Smith 415 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-12 01:44 PM Michael McCandless 343 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-12 11:15 AM Michael McCandless 343 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-11 11:44 AM Michael McCandless 341 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-10 02:50 PM Michael McCandless 327 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-08 09:03 PM DM Smith 292 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-08-04 08:01 PM DM Smith 25 kB
Text File Licensed for inclusion in ASF works LUCENE-1333.patch 2008-07-30 03:30 PM Michael McCandless 19 kB
Text File Licensed for inclusion in ASF works LUCENE-1333a.txt 2008-07-14 09:28 PM DM Smith 19 kB
Environment: All
Issue Links:
Incorporates
 

Lucene Fields: New
Resolution Date: 20/Aug/08 02:40 PM


 Description  « Hide
This was discussed in the thread (not sure which place is best to reference so here are two):
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/%3C21F67CC2-EBB4-48A0-894E-FBA4AECC0D50@gmail.com%3E
or to see it all at once:
http://www.gossamer-threads.com/lists/lucene/java-dev/62851

Issues:
1. JavaDoc is insufficient, leading one to read the code to figure out how to use the class.
2. Deprecations are incomplete. The constructors that take String as an argument and the methods that take and/or return String should all be deprecated.
3. The allocation policy is too aggressive. With large tokens the resulting buffer can be over-allocated. A less aggressive algorithm would be better. In the thread, the Python example is good as it is computationally simple.
4. The parts of the code that currently use Token's deprecated methods can be upgraded now rather than waiting for 3.0. As it stands, filter chains that alternate between char[] and String are sub-optimal. Currently, it is used in core by Query classes. The rest are in contrib, mostly in analyzers.
5. Some internal optimizations can be done with regard to char[] allocation.
6. TokenStream has next() and next(Token), next() should be deprecated, so that reuse is maximized and descendant classes should be rewritten to over-ride next(Token)
7. Tokens are often stored as a String in a Term. It would be good to add constructors that took a Token. This would simplify the use of the two together.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
DM Smith made changes - 14/Jul/08 09:28 PM
Field Original Value New Value
Attachment LUCENE-1333a.txt [ 12386015 ]
Michael McCandless made changes - 30/Jul/08 03:30 PM
Attachment LUCENE-1333.patch [ 12387201 ]
DM Smith made changes - 04/Aug/08 08:01 PM
Attachment LUCENE-1333.patch [ 12387509 ]
DM Smith made changes - 04/Aug/08 08:07 PM
Attachment LUCENE-1333-analysis.patch [ 12387511 ]
DM Smith made changes - 04/Aug/08 08:07 PM
Attachment LUCENE-1333-core.patch [ 12387512 ]
DM Smith made changes - 04/Aug/08 08:08 PM
Attachment LUCENE-1333-analyzers.patch [ 12387513 ]
DM Smith made changes - 04/Aug/08 08:09 PM
Attachment LUCENE-1333-snowball.patch [ 12387514 ]
DM Smith made changes - 04/Aug/08 08:09 PM
Attachment LUCENE-1333-highlighter.patch [ 12387515 ]
DM Smith made changes - 04/Aug/08 08:10 PM
Attachment LUCENE-1333-instantiated.patch [ 12387516 ]
DM Smith made changes - 04/Aug/08 08:10 PM
Attachment LUCENE-1333-lucli.patch [ 12387517 ]
DM Smith made changes - 04/Aug/08 08:11 PM
Attachment LUCENE-1333-memory.patch [ 12387518 ]
DM Smith made changes - 04/Aug/08 08:11 PM
Attachment LUCENE-1333-miscellaneous.patch [ 12387519 ]
DM Smith made changes - 04/Aug/08 08:12 PM
Attachment LUCENE-1333-queries.patch [ 12387520 ]
DM Smith made changes - 04/Aug/08 08:12 PM
Attachment LUCENE-1333-wikipedia.patch [ 12387521 ]
DM Smith made changes - 04/Aug/08 08:12 PM
Attachment LUCENE-1333-wordnet.patch [ 12387522 ]
DM Smith made changes - 04/Aug/08 08:13 PM
Attachment LUCENE-1333-xml-query-parser.patch [ 12387523 ]
DM Smith made changes - 06/Aug/08 09:48 PM
Link This issue depends on LUCENE-1350 [ LUCENE-1350 ]
DM Smith made changes - 08/Aug/08 09:03 PM
Attachment LUCENE-1333.patch [ 12387854 ]
Michael McCandless made changes - 10/Aug/08 02:50 PM
Attachment LUCENE-1333.patch [ 12387900 ]
Doron Cohen made changes - 11/Aug/08 09:22 AM
Link This issue depends on LUCENE-1350 [ LUCENE-1350 ]
Doron Cohen made changes - 11/Aug/08 09:24 AM
Link This issue incorporates LUCENE-1350 [ LUCENE-1350 ]
Michael McCandless made changes - 11/Aug/08 11:44 AM
Attachment LUCENE-1333.patch [ 12387946 ]
Michael McCandless made changes - 12/Aug/08 11:15 AM
Attachment LUCENE-1333.patch [ 12388039 ]
Michael McCandless made changes - 12/Aug/08 01:44 PM
Attachment LUCENE-1333.patch [ 12388048 ]
DM Smith made changes - 18/Aug/08 03:04 PM
Attachment LUCENE-1333.patch [ 12388439 ]
Michael McCandless made changes - 18/Aug/08 04:12 PM
Attachment LUCENE-1333.patch [ 12388460 ]
Michael McCandless made changes - 19/Aug/08 10:11 AM
Assignee Michael McCandless [ mikemccand ]
Michael McCandless made changes - 20/Aug/08 02:40 PM
Resolution Fixed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
Michael McCandless made changes - 11/Oct/08 12:49 PM
Status Resolved [ 5 ] Closed [ 6 ]