|
[
Permlink
| « Hide
]
Michael McCandless added a comment - 27/Oct/06 11:41 PM
ZIP file that needs to be put in src/test/org/apache/lucene/index (used by backwards compatibility test).
ZIP file that needs to be put in src/test/org/apache/lucene/index (used by backwards compatibility test).
Nice job on this very ambitious patch (all 3500 lines of it!) Good tests are certainly important!
Could you elaborate on how the backward compatibility works w.r.t. modifying an old index? In the absense of contention, have you noticed any performance differences in opening an IndexReader? > on one platform (OS X) I found that the directory listing can be incorrect (stale) by up to 1.0 seconds. That sucks... (but great job on the very thorough testing). Thanks Yonik for looking at this!
> Could you elaborate on how the backward compatibility works w.r.t. modifying an old index? OK, first off, the only file whose contents and name are changed is The unit test I added for this (TestBackwardsCompatibility) un-zips a On opening an pre-lockless index, first we see only a "segments" file When loading each Segmentinfo, since the format of the segments files Now, when a writer commits to this pre-lockless index, we write in the The old segments have written the "0"'s for delGen, normGen (null is If a delete or setNorm call takes place against an old segment format, So an index can have mixed old/new segments. The SegmentInfo for each Once an optimize is done, or, all old segments have been merged away, > > on one platform (OS X) I found that the directory listing can be incorrect (stale) by up to 1.0 seconds.
> > That sucks... (but great job on the very thorough testing). > Can it happen with the latest version of OS X? If not, couldn't we just require an upgrade, or do you think that other platforms suffer from this. Yes, this was very surprising and annoying! Unfortunately, this > In the absense of contention, have you noticed any performance differences in opening an IndexReader?
[First one side note: it's during contention that lock-less really I ran the following basic performance test. In each case I measure In order to not count contention, I made temporary mods, to both All times are mili-seconds. Each time is formatted as current Lucene Local index (Linux): 4.62 (6.04) NFS remote index (Linux): 171.26 (11.04) The "remote index" case means a writer on that OS, and a reader on One caveat: I saw quite a bit of variance between runs. I tried to Anyway, the surprising thing is that lockless is faster in most cases I'm not sure these speedups/differences are all that important in a Thanks for all the details Michael! A few more random comments and questions:
In the future, it might be nice if there was an option to disable segments.gen to be more friendly to write-once filesystems like HDFS. As far as performance goes, I was personally interested in the contentionless case since that's what processes that coordinate everything (like Solr) will see. I'm not sure I understand the "segments.gen" logic of writing two longs that are identical. The file deleting code does much more than in the past, and that's a good thing IMO. For example it looks like leftover non-compound segment files from a failed CFS merge (say the JVM dies) will now be cleaned up! I'm having a hard time figuring out how older delete files are removed (since they contain the current segment name, it looks like findDeletableFiles would skip them). Good questions!
> In the future, it might be nice if there was an option to disable I think this makes sense. I will add control over this on the next > As far as performance goes, I was personally interested in the Ahh got it, OK. That's fair. > I'm not sure I understand the "segments.gen" logic of writing two Right, I settled on a simplification of that approach. I write two One thing I will also add is a version header to this file. > The file deleting code does much more than in the past, and that's a Oh, right! In fact any time an index crashes not having committed its > I'm having a hard time figuring out how older delete files are Oooh – you are correct. I do record this file for deleting at the Can the following scenario happen with lock-less commits?
1 A reader reads segments.1, which says the index contains seg_1. > 3 The reader tries to load seg_1 and fails.
That wouldn't be considered a failure because it's part of the retry logic. At that point, an attempt would be made to open seg_2. Then a question might be, could a writer possibly change the index fast enough to prevent a reader from opening at all? I don't think so (and it would be a mis-configured writer IMO), but maybe Michael could speak to that. > That wouldn't be considered a failure because it's part of the retry logic. At that point, an attempt would be made to open seg_2.
From the description of the retry logic, I thought the retry logic only applies to the loading of the "segments_N" file, but not to the entire process of loading all the files of an index. You are right, it wouldn't be a failure if the retry logic is applied to the loading of all the files of an index. > > In the future, it might be nice if there was an option to disable
> > segments.gen to be more friendly to write-once filesystems like > > HDFS. > I think this makes sense. I will add control over this on the next > iteration of the patch. Just to be clear, I didn't mean that I thought it was needed now... When this option is added, perhaps the configuration name should be generic and not tied to the implementation specifics that could change more frequently? Something like WRITE_ONCE or setWriteOnce()? Right, this is just normal contention. We do indeed retry around not In Lucene currently, contention causes a pause (default 1.0 second) It's important to note that at any given instant, the index is always But, because a reader takes non-zero time to load the index, you can There are several ways that contention will manifest itself. These
Anyway, on hitting an IOException, we first retry segments_N-1 (if it I added a couple of tests cases to TestIndexWriter to verify that a On Yonik's question: > Then a question might be, could a writer possibly change the index This is definitely possible. This really is a form of "starvation". Both current Lucene and lockless will hit starvation under high enough Still, I think the point at which starvation starts to happen is far > > In the future, it might be nice if there was an option to disable
> > segments.gen to be more friendly to write-once filesystems like > > HDFS. > I think this makes sense. I will add control over this on the next > iteration of the patch. > Just to be clear, I didn't mean that I thought it was needed now... > When this option is added, perhaps the configuration name should be OK, I see, this is part of a wider context. Maybe it's the creation OK, another version of the lockless commits patch with these fixes:
You still need to put those two zip files into This addresses all feedback/TODOs that I knew about. All unit tests pass. Looks good Michael, I think this is ready to commit!
Does anyone have any concerns with this going into the trunk? I'm all for the patch ... the only thing I'm wondering is about release timing, if that's issue? This changes the on-disk format, which affects more than Lucene Java and, should anyone that's using Lucene out there care (via scripts, etc.), the naming of files on disk.
I'm just wondering if there's any interest/reason for doing a 2.1 before something with those side effects? Steven - I don't see any issues with this going in before 2.1. As a matter of fact, this may be a sufficiently substantial change that will make us want to make a 2.1 release.
Maybe Michael should commit this next week. Oooh – I would love to!
Closing all issues that were resolved for 2.1.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||