|
Agree... higher level synchronization by the user is the right way to ensure/enforce an ordering. I've just started reviewing this patch.
Since doWait() can return after 1 sec, the pattern is to use a while loop with the condition that caused it to be called. But in some cases, it's hard to tell if the code is correct.... for example copyExternalSegments() is hard because of the other non-trival code code in the loop where it's not clear if it's safe/correct to call that code again. Perhaps registerMerge() detects the conflict with another merge with the same segments and that's what keeps things correct? Comments to the effect of why it's OK to run certain code more than once would be very welcome. Thanks, Yonik. I'll add a comment there, and any other places where I call doWait that look similarly confusing...
Improved comments in expungeDeletes & copyExternalSegments.
Yonik, any more feedback on this patch?
I plan to commit this in a day or two.
OK, attached refreshed patch to trunk.
I agree that we should not make any API promises about what
it means when the methods (commit, close, rollback, optimize, addIndexes) are called concurrently from different threads. The discussion below is on their current behavior. > Only one addIndexes can run at once, so call to 2nd or more This is achieved by the read-write lock. > close() and rollback() wait for any running addIndexes to finish Just to clarify: close(waitForMerges=false) and rollback() make > commit() waits for any running addIndexes, or any already running commit() and commit(long) use the read-write lock to wait for > optimize() is allowed to run concurrently with addIndexes; the two This is nice. More detailed comments:
True, agreed.
It's because commit() calls prepareCommit(), which throws a
This is to make sure any just-started addIndexes cleanly finish or
This method has always carried out merges in the FG, but it's in fact It is tempting to fully schedule these external merges (ie allow them
I had to move the optimize() inside the transaction because it could But this change will tie up more disk space than addIndexes used to
Yeah I was using this for internal debugging, and I think it's New patch incorporating Ning's comments (thanks Ning!).
> It's because commit() calls prepareCommit(), which throws a
> "prepareCommit was already called" exception if the commit was already > prepared. Whereas commit(long) doesn't call prepareCommit (eg, it > doesn't need to flush). Without this, I was hitting exceptions in one > of the tests that calls commit() from multiple threads at the same > time. Is it better to simplify things by serializing all commit()/commit(long) calls? > This is to make sure any just-started addIndexes cleanly finish or I wonder if we can simplify the logic... Currently in setMergeScheduler, > This method has always carried out merges in the FG, but it's in fact Hmm... so merges involving external segments may be in FG or BG? > It is tempting to fully schedule these external merges (ie allow them Then what about those BG merges involving external segments?
I don't think so: with autoCommit=true, the merges calls commit(long)
Good catch – I'll make setMergeScheduler synchronized.
Sure we can change the name – do you have one in mind? Maybe
What'll happen is the BG merge will hit an exception, roll itself > I don't think so: with autoCommit=true, the merges calls commit(long)
> after finishing, and I think we want those commit calls to run > concurrently? After we disable autoCommit, all commit calls will be serialized, right? > What'll happen is the BG merge will hit an exception, roll itself Back to the issue of running an external merge in BG or FG.
Right.
Good point! We no longer need to check for isExternal in CMS's merge() method – we can run all merges in the BG. In fact I think it's no longer necessary to even compute & record isExternal (this was its only use). Hmmm, except when I take this out I'm seeing testAddIndexOnDiskFull hang. I'll dig. OK new patch attached with changes discussed above.
I did fix CMS to happily perform merges involving external segments The fix was simple: I changed resolveExternalSegments (renamed from Maybe this should be a separate JIRA issue. In doWait(), the comment says "as a defense against thread timing hazards where notifyAll() falls to be called, we wait for at most 1 second..." In some cases, it seems that notifyAll() simply isn't called, such as some of the cases related to runningMerges. Maybe we should take a closer look at and possibly simplify the concurrency control in IndexWriter, especially when autoCommit is disabled?
I agree – I'm looking forward to taking autoCommit=true case out in 3.0. I'll try to simplify the concurrency control at that point, and test for any deadlocks if doWait is replaced with the "real" wait(), to catch any missing notifyAll()'s. Ning (or anyone), any more feedback on this one? Else I plan to commit soon...
Committed revision 690537
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Attached patch. Here're the changes:
addIndexes just blocks until the one is done.
and then blocks new addIndexes calls
commit, to finish, then quickly takes a snapshot of the segments
and syncs the files referenced by that snapshot. While syncing is
happening addIndexes are then allowed to run again.
simply wait for their respective merges to finish.
I think we should not make any API promises about what it means when
these methods (commit, close, rollback, optimize, addIndexes) are
called concurrently from different threads, except that the methods
all work correctly, IndexWriter won't throw an errant exception, and
your index won't become corrupt.
I made one additional change which is technically a break in backwards
a minor acceptable one: I no longer
compatibility, but I think
allow the same Directory to be passed into addIndexes* more than once.
This was necessary because we identify a SegmentInfo by its
Directory/name pair, and passing in the same Directory allowed dup
SegmentInfo instances to enter SegmentInfos, which is dangerous.
I'll wait a few days before committing.