Issue Details (XML | Word | Printable)

Key: LUCENE-1382
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Michael McCandless
Reporter: Michael McCandless
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Allow storing user data when IndexWriter.commit() is called

Created: 11/Sep/08 12:19 PM   Updated: 25/Sep/09 04:23 PM
Component/s: Index
Affects Version/s: None
Fix Version/s: 2.9

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works LUCENE-1382.patch 2008-09-17 10:19 AM Michael McCandless 12 kB

Lucene Fields: New
Resolution Date: 20/Oct/08 11:33 AM


 Description  « Hide
Spinoff from here:

http://www.mail-archive.com/java-user@lucene.apache.org/msg22303.html

The idea is to allow optionally passing an opaque String commitUserData to the IndexWriter.commit method. This String would be stored in the segments_N file, and would be retrievable by an IndexReader. Applications could then use this to assign meaning to each commit.

It would be nice to get this done for 2.4, but I don't think we should hold the release for it.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Nadav Har'El added a comment - 12/Sep/08 12:53 PM
Hi Mike,
If you add this feature, I suggest you clearly document its purpose. Writing a short comment in the segments file can be useful when utmost performance is needed (e.g., like when you just want to check if you want to reopen() or not), but my guess is that for most other scenarios, there's an alternative in the existing Lucene.

I.e., one possible way to achieve almost the same goal but without changing anything in the code is to put a special document in the index - e.g., imagine you put a document with some unique searchable field/term combination (just to find this document), and a stored field with your "userdata". Before doing the next commit(), just update (i.e., delete and add) this document to a new value, and commit().

This method is not as super-quick as the one you proposed, but I think that for most uses it is quick enough, and more versatile. For example, what happens with the proposed feature if the code that needs to write this "user data" is in a library, which cannot control exactly the commit() times? And what happen if several different libraries or code modules want to write their own different "commit user data"? With the simple alternative method I mentioned you have: 1. several "commit user data"s can exist (by using different field/term to find them), 2. a library can put the commit user data to the index and have it take effect on the next commit (rather than needing to specify it in the commit() call). 3. an extra commit() call does not delete the previously set data (I'm not sure what you intend to do in this case in your suggestion).

I guess that it wouldn't hurt to add the feature that you propose - I just hope that people don't start using it for things that established Lucene mechanisms (like documents) would have been better.


Mark Harwood added a comment - 12/Sep/08 02:06 PM
I feel uncomfortable with the suggestion of inserting "fake documents for the reasons outlined here:
http://www.mail-archive.com/java-user@lucene.apache.org/msg22294.html

Michael McCandless added a comment - 12/Sep/08 05:22 PM
It seems like there is a place for both approaches, since they have different tradeoffs?

I plan to add a static IndexReader method, much like eg getVersion, that lets you retrieve the commitUserData of the current segments_N file. So, you're right, this enables fast checking of the latest commitUserData in the index.

I was thinking it's a single String, and if you set it (by calling commit(commitUserData)), and the commit completes successfully, it overwrites whatever was there, previously.


Michael McCandless added a comment - 17/Sep/08 10:19 AM
Attached patch. I think we should not rush this into 2.4, so, I'll wait to commit to trunk until after we've branched.