Index: lucene/CHANGES.txt =================================================================== --- lucene/CHANGES.txt (revision 1097739) +++ lucene/CHANGES.txt (working copy) @@ -141,6 +141,7 @@ * LUCENE-2315: AttributeSource's methods for accessing attributes are now final, else its easy to corrupt the internal states. (Uwe Schindler) + Changes in Runtime Behavior * LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you @@ -168,6 +169,68 @@ globally across IndexWriter sessions and persisted into a X.fnx file on successful commit. The corresponding file format changes are backwards- compatible. (Michael Busch, Simon Willnauer) + +* LUCENE-2956, LUCENE-2573, LUCENE-2324: Changes from DocumentsWriterPerThread: + + - IndexWriter now uses a DocumentsWriter per thread when indexing documents. + Each DocumentsWriterPerThread indexes documents in its own private segment, + those in memory segments are no longer merged in memory yet flushed + separately to disk and subsequently merged after flushed to disk. + + - DocumentsWriterPerThread (DWPT) are now flushed concurrently based on + FlushPolicy. By default the largest DWPT is selected and replaced with a + fresh DWPT instance. While the new DWPT can continue indexing. The selected + DWPT flushes all its RAM resident documents do disk. + Note: Segment flushes don't flush all RAM resident documents but only the + documents private to the flushing DWPT. + + - Flushing is now controlled by FlushPolicy that is called for every add, + update or delete on IndexWriter. By default DWPTs are flushed either on + maxBufferedDocs per DWPT or the glboal active used memory. Once the active + memory exceeds ramBufferSizeMB only the largest DWPT is selected for + flushing and the memory used by this DWPT is substracted from the active + memory and added to a flushing memory pool which can lead to higher memory + usage due to indexing. + + - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address + up to 2048 MB memory such that the ramBufferSize is now bounded by the max + number of DWPT avaliable in the used DocumentsWriterPerThreadPool. + IndexWriters net memory consumption can grow far beyond the 2048 MB limit if + the applicatoin can use all available DWPTs. To prevent a DWPT from + exhausting its address space IndexWriter will forcefully flush a DWPT if its + hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled + via IndexWriterConfig and defaults to 1945 MB. + Since IndexWriter flushes DWPT concurrently not all memory is released + immediately. Applications should still use a ramBufferSize significantly + lower than the JVMs avaliable heap memory since under high load multiple + flushing DWPT can lurge around in memory especially if IO performance is + poor. + + - IndexWriter#commit now doesn't block concurrent indexing while flushing all + 'currently' RAM resident documents to disk. Yet, flushes that occure while a + full flushs is in flight are queued and will happend after all DWPT involved + in the full flush are done flushing. Applications using multiple threads for + indexing and trigger full flushes while indexing can use significantly more + memory during a full flush. + + - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing + threads if the number of active + number of flushing DWPT are exceed a + safety limit. By default IW considered stalled when 2 * max number + available thread states (DWPTPool) is exceeded. This safety limit prevents + applications from exhausting their available memory if flushing can't keep + up with concurrently indexing threads. + + - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms + limit is reached during indexing. No segment flushes will be triggered + due to this setting. + + - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter + anymore. A dedicated flushLock has been introduced to prevent multiple full- + flushes happening concurrently. + + - DocumentsWriter doesn't write shared doc stores anymore. + + (Mike McCandless, Michael Busch, Simon Willnauer) API Changes