|
I feel uncomfortable with the suggestion of inserting "fake documents for the reasons outlined here:
http://www.mail-archive.com/java-user@lucene.apache.org/msg22294.html It seems like there is a place for both approaches, since they have different tradeoffs?
I plan to add a static IndexReader method, much like eg getVersion, that lets you retrieve the commitUserData of the current segments_N file. So, you're right, this enables fast checking of the latest commitUserData in the index. I was thinking it's a single String, and if you set it (by calling commit(commitUserData)), and the commit completes successfully, it overwrites whatever was there, previously. Attached patch. I think we should not rush this into 2.4, so, I'll wait to commit to trunk until after we've branched.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If you add this feature, I suggest you clearly document its purpose. Writing a short comment in the segments file can be useful when utmost performance is needed (e.g., like when you just want to check if you want to reopen() or not), but my guess is that for most other scenarios, there's an alternative in the existing Lucene.
I.e., one possible way to achieve almost the same goal but without changing anything in the code is to put a special document in the index - e.g., imagine you put a document with some unique searchable field/term combination (just to find this document), and a stored field with your "userdata". Before doing the next commit(), just update (i.e., delete and add) this document to a new value, and commit().
This method is not as super-quick as the one you proposed, but I think that for most uses it is quick enough, and more versatile. For example, what happens with the proposed feature if the code that needs to write this "user data" is in a library, which cannot control exactly the commit() times? And what happen if several different libraries or code modules want to write their own different "commit user data"? With the simple alternative method I mentioned you have: 1. several "commit user data"s can exist (by using different field/term to find them), 2. a library can put the commit user data to the index and have it take effect on the next commit (rather than needing to specify it in the commit() call). 3. an extra commit() call does not delete the previously set data (I'm not sure what you intend to do in this case in your suggestion).
I guess that it wouldn't hurt to add the feature that you propose - I just hope that people don't start using it for things that established Lucene mechanisms (like documents) would have been better.