Index: lucene/src/java/org/apache/lucene/index/IndexWriter.java =================================================================== --- lucene/src/java/org/apache/lucene/index/IndexWriter.java (revision 1097333) +++ lucene/src/java/org/apache/lucene/index/IndexWriter.java (working copy) @@ -56,17 +56,16 @@ /** An IndexWriter creates and maintains an index. -

The create argument to the {@link - #IndexWriter(Directory, IndexWriterConfig) constructor} determines +

The {@link OpenMode} option on + {@link IndexWriterConfig#setOpenMode(OpenMode)} determines whether a new index is created, or whether an existing index is - opened. Note that you can open an index with create=true - even while readers are using the index. The old readers will + opened. Note that you can open an index with {@link OpenMode#CREATE} + even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, - and won't see the newly created index until they re-open. There are - also {@link #IndexWriter(Directory, IndexWriterConfig) constructors} - with no create argument which will create a new index - if there is not already an index at the provided path and otherwise - open the existing index.

+ and won't see the newly created index until they re-open. If + {@link OpenMode#CREATE_OR_APPEND} is used IndexWriter will create a + new index if there is not already an index at the provided path + and otherwise open the existing index.

In either case, documents are added with {@link #addDocument(Document) addDocument} and removed with {@link #deleteDocuments(Term)} or {@link @@ -78,15 +77,19 @@

These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method - calls). A flush is triggered when there are enough - buffered deletes (see {@link IndexWriterConfig#setMaxBufferedDeleteTerms}) - or enough added documents since the last flush, whichever - is sooner. For the added documents, flushing is triggered - either by RAM usage of the documents (see {@link - IndexWriterConfig#setRAMBufferSizeMB}) or the number of added documents. - The default is to flush when RAM usage hits 16 MB. For + calls). A flush is triggered when there are enough added documents + since the last flush. Flushing is triggered either by RAM usage of the + documents (see {@link IndexWriterConfig#setRAMBufferSizeMB}) or the + number of added documents (see {@link IndexWriterConfig#setMaxBufferedDocs(int)}). + The default is to flush when RAM usage hits + {@value IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. For best indexing speed you should flush by RAM usage with a - large RAM buffer. Note that flushing just moves the + large RAM buffer. Additionally, if IndexWriter reaches the configured number of + buffered deletes (see {@link IndexWriterConfig#setMaxBufferedDeleteTerms}) + the deleted terms and queries are flushed and applied to existing segments. + In contrast to the other flush options {@link IndexWriterConfig#setRAMBufferSizeMB} and + {@link IndexWriterConfig#setMaxBufferedDocs(int)}, deleted terms + won't trigger a segment flush. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #commit()} or {@link #close} is called. A flush may @@ -1247,7 +1250,8 @@ /** * Deletes the document(s) containing any of the - * terms. All deletes are flushed at the same time. + * terms. All given deletes are applied and flushed atomically + * at the same time. * *

NOTE: if this method hits an OutOfMemoryError * you should immediately close the writer. See NOTE: if this method hits an OutOfMemoryError * you should immediately close the writer. See Disabled by default (writer flushes by RAM usage). - * + * in-memory delete terms and queries are applied and flushed. + *

Disabled by default (writer flushes by RAM usage).

+ *

+ * NOTE: This setting won't trigger a segment flush. + *

+ * * @throws IllegalArgumentException if maxBufferedDeleteTerms * is enabled but smaller than 1 * @see #setRAMBufferSizeMB @@ -372,8 +372,8 @@ } /** - * Returns the number of buffered deleted terms that will trigger a flush if - * enabled. + * Returns the number of buffered deleted terms that will trigger a flush of all + * buffered deletes if enabled. * * @see #setMaxBufferedDeleteTerms(int) */ @@ -406,8 +406,10 @@ * way to measure the RAM usage of individual Queries so the accounting will * under-estimate and you should compensate by either calling commit() * periodically yourself, or by using {@link #setMaxBufferedDeleteTerms(int)} - * to flush by count instead of RAM usage (each buffered delete Query counts - * as one). + * to flush and apply buffered deletes by count instead of RAM usage + * (for each buffered delete Query a constant number of bytes is used to estimate + * RAM usage). Note that enabling {@link #setMaxBufferedDeleteTerms(int)} will + * not trigger any segment flushes. *

* NOTE: It's not guaranteed that all memory resident documents are flushed * once this limit is exceeded. Depending on the configured {@link FlushPolicy} only a @@ -417,6 +419,7 @@ * * The default value is {@link #DEFAULT_RAM_BUFFER_SIZE_MB}. * @see #setFlushPolicy(FlushPolicy) + * @see #setRAMPerThreadHardLimitMB(int) * *

Takes effect immediately, but only the next time a * document is added, updated or deleted. @@ -537,24 +540,43 @@ return mergePolicy; } - /** - * Sets the max number of simultaneous threads that may be indexing documents - * at once in IndexWriter. Values < 1 are invalid and if passed - * maxThreadStates will be set to - * {@link #DEFAULT_MAX_THREAD_STATES}. - * - *

Only takes effect when IndexWriter is first created. */ + /** Sets the {@link DocumentsWriterPerThreadPool} instance used by the + * IndexWriter to assign thread-states to incoming indexing threads. If no + * {@link DocumentsWriterPerThreadPool} is set {@link IndexWriter} will use + * {@link ThreadAffinityDocumentsWriterThreadPool} with max number of + * thread-states set to {@value #DEFAULT_MAX_THREAD_STATES} (see + * {@link #DEFAULT_MAX_THREAD_STATES}). + *

+ *

+ * NOTE: The given {@link DocumentsWriterPerThreadPool} instance must not be used with + * other {@link IndexWriter} instances once it has been initialized / associated with an + * {@link IndexWriter}. + *

+ *

+ * NOTE: This only takes effect when IndexWriter is first created.

*/ public IndexWriterConfig setIndexerThreadPool(DocumentsWriterPerThreadPool threadPool) { + if(threadPool == null) { + throw new IllegalArgumentException("DocumentsWriterPerThreadPool must not be nul"); + } this.indexerThreadPool = threadPool; return this; } + /** Returns the configured {@link DocumentsWriterPerThreadPool} instance. + * @see #setIndexerThreadPool(DocumentsWriterPerThreadPool) + * @return the configured {@link DocumentsWriterPerThreadPool} instance.*/ public DocumentsWriterPerThreadPool getIndexerThreadPool() { return this.indexerThreadPool; } - /** Returns the max number of simultaneous threads that - * may be indexing documents at once in IndexWriter. */ + /** Returns the max number of simultaneous threads that may be indexing + * documents at once in IndexWriter. + *

+ * To modify the max number of thread-states a new + * {@link DocumentsWriterPerThreadPool} must be set via + * {@link #setIndexerThreadPool(DocumentsWriterPerThreadPool)}. + *

+ * @see #setIndexerThreadPool(DocumentsWriterPerThreadPool) */ public int getMaxThreadStates() { return indexerThreadPool.getMaxThreadStates(); }