I tried out this patch (July18), and have a few comments...
First, it is nice to be able to add/remove documents with no need to care for switching between readers and writers, and without worrying for performance issues as result of that switching. I did not test for performance yet.
This post is quite long, so here is an outline...
(1) Compile error in test code
(2) Failing tests - is this patch actually fixing a bug in current flushRamSegments()?
(3) Additional tests I ran
(4) Javadocs remarks
(5) deleteDocument(int doc) not implemented
(6) flush() not implemented
(7) Method name - batchDeleteDocuments(Term)
(8) Class name and placement + What's Next for this patch
------------ (1) Compile error in test code
The new TestWriterDelete does not reflect recent name change from IndexWriter to NewIndexModifier. Easily fixed by renaming accordingly in that file.
------------ (2) Failing tests - does this patch also fix a bug in current flushRamSegments()?
"ant test" has one failure: TestIndexModifier.testIndex().
This is the same issue that Ning described above. However I think it exposes a bug in current flushRamSegments(): when an index with 1 segment on disk that has 2 documents, one of which is deleted, and 1 segment in memory, is closed, this method decides to merge - prematurely - the two segments into one. This wrong behavior (if I understand things correctly) is - by "mistake" - causing TestIndexModifier.testIndex() to pass in the current implementation of flushRamSegments(). But this comes with the cost of too many merges. If one is interleaving adds and deletes this bug would become costly. I will post a separate question on this to the dev forum, to discuss if this is indeed a bug.
------------ (3) Additional tests I ran
I wanted to verify that all existing functions (well, at least tested ones..) are working with the new class (NewIndexModifier). So I temporarily renamed the existing IndexWriter to IndexWriter0, and renamed NewIndexModifier to IndexWriter (now extending IndexWriter0). For compiling, now, also had to temporarily modify args from IndexWriter to IndexWriter0 in 3 classes - DocumentWriter, SegmentMerger, and also from NewIndexModifier to IndexWriter in the new TestWriteDelete. (Note again: these modifications are temporary, just for the sake of testing this new class as if it was the new IndexWriter, which it is not.) Now all the tests were using the new class instead of the original IndexWriter.
All tests passed, except for TestIndexModifier.testIndex() - this is the same failure as above - so, no problem detected in new class.
------------ (4) Javadocs Remarks
Current Javadocs for the new class focus on changes to the implementation. I think this description of implementation changes should be made regular Java comments (for developers), and instead should add a shorter javadoc that describes the API for users, and the implications on behavior as result of buffering deletes.
------------ (5) deleteDocument(int doc) not implemented
Original IndexModifier has a delete(int docs), the new class doesn't. At first this seems ok, since internal doc IDs are not accessible through index writer (unlike index reader). But IndexModifier also does not provide access to doc-ids. So why was delete-by-id enabled in IndexModifier? Perhaps there's a good reason for it, that I fail to see - if so, it should probably be added to the new class as well. Adding this is required if the new class would eventually replace the implementation of current index modifier.
------------ (6) flush() not implemented
Original IndexModifier has a flush(int docs) method, allowing to commit any pending changes. I think it would be nice to have this feature here as well, for forcing any pending changes (without caller having to modify the max-bufferred value). This would allow more control when using this class. Again, adding this is required if the new class would eventually replace the implementation of current index modifier.
------------ (7) Method name - batchDeleteDocuments(Term)
I would prefer it to be called deleteDocuments(Term), and let Java decide which method to call. Main reason is developers would expect that methods with similar semantics are named similarly, especially when using IDEs like Eclipse, where users type "i.del" and the IDE lets them select from all the methods that start with "del".
------------ (8) Class name and placement + What's Next for this patch
Performance test should be added for this new class. Also, I did not code review the actual code changes to IndexWriter and the code of NewIndexModifier itself.
It seems to me that this class would be very useful for users, either as a new class or if it replaces the current implementation of IndexModifier. Latter would be possible only if the 2 missing methods mentioned above are added. In this case, the "immediate delete" behavior of current IndexModifier should be possible to achieve by users, by setting maxBefferedDeleteTerms to 1.
One disadvantage of this class vs. current IndexModifier is the ability to add access to further methods of IndexReader. With current IndexModifier this is very simple (though not always efficient) - just follow code template in existing methods, i.e. close writer/reader and open reader/writer as required. With the new class, exposing further methods of IndexReader would be more of a challenge. Perhaps having a multiReader on all segment readers can do. I am not sure to what extent this should be a consideration, so just bringing it up.
So, If this class replaces IndexModifier - fine. If not, how about calling it BufferredIndexWriter?