Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
4.0, 4.1, 4.2, 4.2.1
-
None
-
None
-
New
Description
Followup from LUCENE-4930:
Field.java has a private StringTokenStream which is used as TokenStream implementation for StringField (single value String tokens). Unfortunately this TokenStream is created on every new document/field while indexing, making the cost of creating the TS a significant time. With very old Java versions this also involves a lock in ReferenceQueue.poll() when called from addAttribute().
In Lucene 3.x, DocInverterPerThread has a private thread-local AttributeSource for reusing, but because this was factored out to Field.java, we can no longer use CloseableThreadLocal (because Field are not Closeable). We should maybe move the special One-Token TokenStream back to DocInverterPerThread and just let Field.java delegate there. I know this would let us move back to 3.x where we had special handling of single token Fields in the indexer....
Another approach would be to make Field.java use a static KeywordAnalyzer (it needs then be moved to core) or we add a ThreadLocal to Field.java (which may be expensive). Unfortunately this makes it hard to maintain, as the thread-localness is also needed to be bound to the IndexWriter instance. Because you could have 2 IndexWriters open at same time and add documents to both of them from one thread... This brings us back to my previous solution.
Attachments
Issue Links
- duplicates
-
LUCENE-4317 Field.java does not reuse its inlined Keyword-TokenStream
- Closed
- supercedes
-
LUCENE-4930 Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention
- Closed