Issue Details (XML | Word | Printable)

Key: LUCENE-325
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Michael McCandless
Reporter: John Wang
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

[PATCH] new method expungeDeleted() added to IndexWriter

Created: 15/Dec/04 08:55 AM   Updated: 11/Oct/08 12:49 PM
Return to search
Component/s: Index
Affects Version/s: CVS Nightly - Specify date in submission
Fix Version/s: 2.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File attachment.txt 2004-12-15 01:49 PM John Wang 2 kB
Text File IndexWriter.patch 2004-12-18 10:56 AM John Wang 3 kB
Text File IndexWriter.patch 2004-12-15 08:57 AM John Wang 112 kB
Text File Licensed for inclusion in ASF works LUCENE-325.patch 2008-02-09 03:20 PM Michael McCandless 14 kB
Java Source File TestExpungeDeleted.java 2004-12-15 02:13 PM John Wang 3 kB
Environment:
Operating System: Windows XP
Platform: All

Bugzilla Id: 32712
Resolution Date: 11/Feb/08 08:35 PM


 Description  « Hide
We make use the docIDs in lucene. I need a way to compact the docIDs in segments
to remove the "holes" created from doing deletes. The only way to do this is by
calling IndexWriter.optimize(). This is a very heavy call, for the cases where
the index is large but with very small number of deleted docs, calling optimize
is not practical.

I need a new method: expungeDeleted(), which finds all the segments that have
delete documents and merge only those segments.

I have implemented this method and have discussed with Otis about submitting a
patch. I don't see where I can attached the patch. I will do according to the
patch guidleine and email the lucene mailing list.

Thanks

-John

I don't see a place where I can



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
John Wang added a comment - 15/Dec/04 08:57 AM
Created an attachment (id=13757)
Patched IndexWriter.java, contains a new method: expungeDeleted()

I see now where to attach the patch

See attached patch.


Otis Gospodnetic added a comment - 15/Dec/04 12:13 PM
Either this Bugzilla diff is not working properly, or you've made a number of
other more stylistic changes to IndexWriter.java, which make it very hard to see
the real change you made. Could you provide a diff with just the real changes?

Also, if you have a unit test that excercises your addition, so we can be sure
it doesn't break anything, that would be great. You'd want your unit test to
mix and match Document additions and deletions and make sure it all still works
properly.
Thanks.


John Wang added a comment - 15/Dec/04 01:49 PM
Created an attachment (id=13759)
implementation of expungeDeleted method

Attached is the implementation of IndexWriter.expungeDeleted method. It is self
contained.

Thanks

-John


John Wang added a comment - 15/Dec/04 02:13 PM
Created an attachment (id=13760)
Unit test for the patch submited

Attached please find the unit test for IndexWriter.expungeDelete() request by
Otis.

Thanks

-John


John Wang added a comment - 18/Dec/04 10:56 AM
Created an attachment (id=13776)
updated patch with a fixed diff version

1) As request I reran cvs diff without having done a file format (which screwed
up cvs)

2) implementation of expungeDelete was made better to handle/delete the
compound file after segment info update.


Grant Ingersoll added a comment - 13/Jan/08 03:29 PM
This seems generally useful. I imagine, though, that the patch is way out of date. I wonder if the new ability to merge some segments might have an option to do this kind of thing.

Any thoughts on resurrecting this?


Michael McCandless added a comment - 14/Jan/08 03:36 PM
I think we should resurrect this: I agree it's useful. I'll take it & tentatively mark it 2.4 (hopefully I can make time by then!).

The original patch would simply merge one segment "in place". I think we can improve this a bit by merging any adjacent series of segments that have deletions? This would still preserve docID ordering, but would also accomplish some merging as a side effect (I think a good thing).


Michael McCandless added a comment - 09/Feb/08 03:20 PM
Attached patch. All tests pass. I plan to commit in a day or two.

This adds two methods to IndexWriter:

expungeDeletes() – defaults to doWait=true
expungeDeletes(boolean doWait)

If doWait is false, and you have a MergeScheduler that runs merges in
BG threads, then the call returns immediately.

I extended MergePolicy so it decides what "expunge deletes" really
means (findMergesToExpungeDeletes). Then, in LogMergePolicy, I
implemented this policy: we merge all adjacent segments (up to
mergeFactor at once) that have deletes. If only 1 segment has
deletes, it's a singular merge.


Michael McCandless added a comment - 11/Feb/08 08:35 PM
I just committed this. Thanks John! And sorry for the looooong
delay.

I also added an "these APIs are experimental" warning on top of
MergePolicy and MergeScheduler (which I should have done before
2.3 , though I don't expect alot of usage of these).