Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4931

Make oal.document.Field reuse its internal StringTokenStream



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 4.0, 4.1, 4.2, 4.2.1
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
    • Lucene Fields:


      Followup from LUCENE-4930:
      Field.java has a private StringTokenStream which is used as TokenStream implementation for StringField (single value String tokens). Unfortunately this TokenStream is created on every new document/field while indexing, making the cost of creating the TS a significant time. With very old Java versions this also involves a lock in ReferenceQueue.poll() when called from addAttribute().

      In Lucene 3.x, DocInverterPerThread has a private thread-local AttributeSource for reusing, but because this was factored out to Field.java, we can no longer use CloseableThreadLocal (because Field are not Closeable). We should maybe move the special One-Token TokenStream back to DocInverterPerThread and just let Field.java delegate there. I know this would let us move back to 3.x where we had special handling of single token Fields in the indexer....

      Another approach would be to make Field.java use a static KeywordAnalyzer (it needs then be moved to core) or we add a ThreadLocal to Field.java (which may be expensive). Unfortunately this makes it hard to maintain, as the thread-localness is also needed to be bound to the IndexWriter instance. Because you could have 2 IndexWriters open at same time and add documents to both of them from one thread... This brings us back to my previous solution.


          Issue Links



              • Assignee:
                uschindler Uwe Schindler
                uschindler Uwe Schindler
              • Votes:
                0 Vote for this issue
                1 Start watching this issue


                • Created: