Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1333

Token implementation needs improvements

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 2.4
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      All

    • Lucene Fields:
      New

      Description

      This was discussed in the thread (not sure which place is best to reference so here are two):
      http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/%3C21F67CC2-EBB4-48A0-894E-FBA4AECC0D50@gmail.com%3E
      or to see it all at once:
      http://www.gossamer-threads.com/lists/lucene/java-dev/62851

      Issues:
      1. JavaDoc is insufficient, leading one to read the code to figure out how to use the class.
      2. Deprecations are incomplete. The constructors that take String as an argument and the methods that take and/or return String should all be deprecated.
      3. The allocation policy is too aggressive. With large tokens the resulting buffer can be over-allocated. A less aggressive algorithm would be better. In the thread, the Python example is good as it is computationally simple.
      4. The parts of the code that currently use Token's deprecated methods can be upgraded now rather than waiting for 3.0. As it stands, filter chains that alternate between char[] and String are sub-optimal. Currently, it is used in core by Query classes. The rest are in contrib, mostly in analyzers.
      5. Some internal optimizations can be done with regard to char[] allocation.
      6. TokenStream has next() and next(Token), next() should be deprecated, so that reuse is maximized and descendant classes should be rewritten to over-ride next(Token)
      7. Tokens are often stored as a String in a Term. It would be good to add constructors that took a Token. This would simplify the use of the two together.

        Attachments

        1. LUCENE-1333.patch
          415 kB
          Michael McCandless
        2. LUCENE-1333.patch
          415 kB
          DM Smith
        3. LUCENE-1333.patch
          343 kB
          Michael McCandless
        4. LUCENE-1333.patch
          343 kB
          Michael McCandless
        5. LUCENE-1333.patch
          341 kB
          Michael McCandless
        6. LUCENE-1333.patch
          327 kB
          Michael McCandless
        7. LUCENE-1333.patch
          292 kB
          DM Smith
        8. LUCENE-1333.patch
          25 kB
          DM Smith
        9. LUCENE-1333.patch
          19 kB
          Michael McCandless
        10. LUCENE-1333a.txt
          19 kB
          DM Smith
        11. LUCENE-1333-analysis.patch
          32 kB
          DM Smith
        12. LUCENE-1333-analyzers.patch
          111 kB
          DM Smith
        13. LUCENE-1333-core.patch
          23 kB
          DM Smith
        14. LUCENE-1333-highlighter.patch
          10 kB
          DM Smith
        15. LUCENE-1333-instantiated.patch
          6 kB
          DM Smith
        16. LUCENE-1333-lucli.patch
          1 kB
          DM Smith
        17. LUCENE-1333-memory.patch
          11 kB
          DM Smith
        18. LUCENE-1333-miscellaneous.patch
          11 kB
          DM Smith
        19. LUCENE-1333-queries.patch
          5 kB
          DM Smith
        20. LUCENE-1333-snowball.patch
          4 kB
          DM Smith
        21. LUCENE-1333-wikipedia.patch
          38 kB
          DM Smith
        22. LUCENE-1333-wordnet.patch
          4 kB
          DM Smith
        23. LUCENE-1333-xml-query-parser.patch
          4 kB
          DM Smith

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                dmsmith DM Smith
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: