Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4931

Make oal.document.Field reuse its internal StringTokenStream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 4.0, 4.1, 4.2, 4.2.1
    • None
    • core/index
    • None
    • New

    Description

      Followup from LUCENE-4930:
      Field.java has a private StringTokenStream which is used as TokenStream implementation for StringField (single value String tokens). Unfortunately this TokenStream is created on every new document/field while indexing, making the cost of creating the TS a significant time. With very old Java versions this also involves a lock in ReferenceQueue.poll() when called from addAttribute().

      In Lucene 3.x, DocInverterPerThread has a private thread-local AttributeSource for reusing, but because this was factored out to Field.java, we can no longer use CloseableThreadLocal (because Field are not Closeable). We should maybe move the special One-Token TokenStream back to DocInverterPerThread and just let Field.java delegate there. I know this would let us move back to 3.x where we had special handling of single token Fields in the indexer....

      Another approach would be to make Field.java use a static KeywordAnalyzer (it needs then be moved to core) or we add a ThreadLocal to Field.java (which may be expensive). Unfortunately this makes it hard to maintain, as the thread-localness is also needed to be bound to the IndexWriter instance. Because you could have 2 IndexWriters open at same time and add documents to both of them from one thread... This brings us back to my previous solution.

      Attachments

        Issue Links

          Activity

            People

              uschindler Uwe Schindler
              uschindler Uwe Schindler
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: