|
Hoss Man made changes - 03/Nov/07 05:23 AM
Hoss Man made changes - 03/Nov/07 05:23 AM
The last comment is not correct, in that, there are many Java based applications (and non-java) that offer true transactional integrity.
It usually involves a log file, and using sync to ensure data is written to disk. The Lucene structure allows for this VERY easily, as the 'segments' file controls everything. If all previous files are "synced", and then the 'segments.new' file written, and synced (with a marker/checksum). Then the old 'segments' deleted, and 'segments.new' 'renamed to 'segments'. It is trivial to ensure transactional integrity. Upon index open, check for segments.new - if doesn't exist, or does not have a valid checksum, delete all segments not in 'segments', if it is valid, then reattempt the rename. Then open the index. Robert,
Are your comments applicable for version 1.4.3? The behavior of an all-zero 'segments' and 'deleted' files is very easily reproduced. Also, there is no left over 'segments.new' after a power-cord yank. Thanks. See the healthy follow-on discussion here:
http://www.gossamer-threads.com/lists/lucene/java-dev/54300 I plan to add optional argument when calling FSDirectory.getDirectory() to ask all created FSIndexOutputs to always call sync() on the file descriptor before closing it.
Michael McCandless made changes - 04/Nov/07 10:34 AM
This recent thread is also relevant here:
Attached patch that adds optional "doSync" boolean to
FSDirectory.getDirectory(...). It defaults to "false". When true, I call file.getFD().sync() just before file.close() in FSIndexOutput.close(). However, I can't figure out how to also sync the directory. Does All tests pass if I default it to true or to false.
Michael McCandless made changes - 04/Nov/07 03:28 PM
Attached another rev of the patch, that adds "fsdirectory.dosync" I ran a quick perf test of sync vs no sync. I indexed all of analyzer=org.apache.lucene.analysis.SimpleAnalyzer
docs.file=/lucene/wikifull.txt doc.maker.forever=false ResetSystemErase RepSumByName This is on a quad core Mac OS X (Mac Pro) with a 4-drive RAID 0 IO I also tried opening the file descriptor with "rws", which I think is Maybe we should actually make doSync=true the default? It seems like
Michael McCandless made changes - 04/Nov/07 06:29 PM
Attached another rev of the patch.
I changed the default to "true": I think the small performance hit is Also put a try/finally around the the call to sync to make sure we
Michael McCandless made changes - 06/Nov/07 10:38 PM
I just committed this. Thanks Venkat!
Michael McCandless made changes - 10/Nov/07 05:51 PM
Was that compound or non-compound index format? I imagine non-compound will take a bigger hit since each file will be synchronized separately and in a serialized fashion. I also imagine that the hit will be larger for a weaker disk subsystem, and for usage patterns that continually add a few docs and close? Is a sync before every file close really needed, or can some of them be avoided when autocommit==false? I agree. Just for a baseline, I think the test needs to be done on a single drive system.
Also, the 'sync' should be optional. BerkleyDB offers similar functionality. The reason being, if the index can be completely recreated from other sources, you might not want to pay the performance hit, instead recreate the index if corruption/hard failure occurs.
The test was with compound file. But, the close() on each component file that goes into the compound We can't safely remove the sync() on each component file before I guess we could revisit whether that commit (before building the
OK I'll run the same test, but once on a laptop and once over NFS to Yes, continually adding docs & flushing/closing your writer will in
It's somewhat tricky to safely remove sync() even when If there were a way to sync a file after having closed it (is there?) Also, I was thinking we could start simple (call sync() before every
It is optional: I added doSync boolean to And, I agree: for cases where there is very low cost to regenerate the > Is a sync before every file close really needed [...] ?
It might be nice if we could use the Linux sync() system call, instead of fsync(). Then we could call that only when the new segments file is moved into place rather than as each file is closed. We could exec the sync shell command when running on Unix, but I don't know whether there's an equivalent command for Windows, and it wouldn't be Java... OK I ran sync/nosync tests across various platforms/IO system. In
each case I ran the test once with doSync=true and once with doSync=false, using this alg: analyzer=org.apache.lucene.analysis.SimpleAnalyzer doc.maker.forever=false ResetSystemErase RepSumByName Ie, time to index the first 150K docs from Wikipedia. Results for single hard drive: Mac mini (10.5 Leopard) single 4200 RPM "notebook" (2.5") drive – 2.3% slower: sync - 296.80 sec Mac pro (10.4 Tiger), single external drive – 35.5% slower: sync - 259.61 sec Win XP Pro laptop, single drive – 38.2% slower sync - 536.00 sec Linux (2.6.22.1), ext3 single drive – 23% slower sync - 185.42 sec Results for multiple hard drives (RAID arrays): Linux (2.6.22.1), reiserfs 6 drive RAID5 array – 49% slower (!!) sync - 239.32 sec Mac Pro (10.4 Tiger), 4 drive RAID0 array – 1% faster sync - 157.26 sec So at this point I'm torn... The performance cost of the simplest approach (sync() before close()) It's frustrating to lose such performance "out of the box" for the Maybe we should leave the default as false for now?
Michael McCandless made changes - 13/Nov/07 09:53 PM
> Maybe we should leave the default as false for now?
Perhaps for the short-term, but long-term it would be better to find a solution that's both reliable and doesn't have such a big performance impact. We really don't need to sync until we commit. It would be interesting to know how much it slows things to do that. As a quick hack we could try running the 'sync' command line program at each commit. If performance looks good, then we might look into implementing this in pure Java, changing FSDirectory.close() to queue FileDescriptors, add a background thread that syncs queued files, and add a Directory.sync() method that blocks until the queue is empty.
Agreed. I will default doSync back to false, for now.
I will test this as a hack first just to see how performance compares
Will do!
OK, I tested calling command-line "sync", after writing each segments
file. It's in fact even slower than fsync on each file for these 3 cases: Linux (2.6.22.1), reiserfs 6 drive RAID5 array 93% slower Linux (2.6.22.1), ext3 single drive 60% slower Mac Pro (10.4 Tiger), 4 drive RAID0 array 28% slower I'll look into the separate thread to sync/close files in the I think changing the only constructor in FSDirectory.FSIndexOutput is
an API change. I have a class that extends FSIndexOutput and it doesn't compile anymore after switching to the 2.3-dev jar. I think we should put this ctr back: Woops, OK I will put it back ...
I was wondering if delaying sync to actual commit point would run faster So... my small test sequentially writes M characters to N files
Each configuration ran twice and there are fluctuations, Comparing "immediate" to "background" I it is not clearly worth it With some artificial CPU activity added to the test program:
Attached FSyncPerfTest.java is the standalone (non Lucene) perf test that I used.
Doron Cohen made changes - 22/Nov/07 02:16 PM
> I found out however that delaying the syncs (but intending to sync) also
means keeping the file handles open [...] Not necessarily. You could just queue the file names for sync, close them, and then have the background thread open, sync and close them. The close could trigger the OS to sync things faster in the background. Then the open/sync/close could mostly be a no-op. Might be worth a try. OK I did a simplistic patch (attached) whereby FSDirectory has a
background thread that re-opens, syncs, and closes those files that Lucene has written. (I'm using a modified version of the class from Doron's test). This patch is nowhere near ready to commit; I just coded up enough so I ran the same alg as the tests above (index first 150K docs of
The good news is, the non-CFS case shows very little cost when we do The bad news is, the CFS case still shows a high cost. However, by One caveat: I'm using a 8 MB RAM buffer for all of these tests. As
Michael McCandless made changes - 27/Nov/07 08:14 PM
Woops, the last line in the table above is wrong (it's a copy of the line before it). I'll re-run the test.
How about if we don't sync every single commit point? I think on a crash what's important when you come back up is 1) index EG we could force a full sync only when we commit the merge, before we That would be a big simplification because I think we could just do This would also mean we cannot delete the commit points that were not Plus we would have to fix retry logic on loading the segments file to I think it should mean better performance, because the longer you wait > How about if we don't sync every single commit point?
I'm confused. The semantics of commit should be that all changes prior are made permanent, and no subsequent changes are permanent until the next commit. So syncs, if any, should map 1:1 to commits, no? Folks can make indexing faster by committing/syncing less often.
But must every "automatic buffer flush" by IndexWriter really be a Even if we use that policy, the BG sync thread can still fall behind I think "merge finished" should be made a "permanent commit" because Maybe we could make some flushes permanent but not all, depending on In general, I think the longer we can wait after flushing before Regardless of what policy we choose here (which commits must be made I also worry about those applications that are accidentally flushing > But must every "automatic buffer flush" by IndexWriter really be a
"permanent commit"? When autoCommit is true, then we should periodically commit automatically. When autoCommit is false, then nothing should be committed until the IndexWriter is closed. The ambiguous case is flush(). I think the reason for exposing flush() was to permit folks to commit without closing, so I think flush() should commit too, but we could add a separate commit() method that flushes and commits. > People who upgrade will suddenly get much worse performance. Yes, that would be bad. Perhaps the semantics of autoCommit=true should be altered so that it commits less than every flush. Is that what you were proposing? If so, then I think it's a good solution. Prior to 2.2 the commit semantics were poorly defined. Folks were encouraged to close() their IndexWriter to persist changes, and that's about all we said. 2.2's docs say that things are committed at every flush, but there was no sync, so I don't think changing this could break any applications. So I'm +1 for changing autoCommit=true to sync less than every flush, e.g., only after merges. I'd also argue that we should be vague in the documentation about precisely when autoCommit=true commits. If someone needs to know exactly when things are committed then they should be encouraged to explicitly flush(), not to rely on autoCommit. I modified the CFS sync case to NOT bother syncing the files that go New patched attached (still a hack just to test performance!) and new
I think deprecating flush(), renaming it to commit(), and clarifying
OK, I will test the "sync only when committing a merge" approach for And I agree we should be vague about, and users should never rely on, > I think deprecating flush(), renaming it to commit()
+1 That's clearer, since flushes are internal optimizations, while commits are important events to clients.
I am taking this approach now, but one nagging question I have is: do In code: file = new RandomAccess(path, "rw"); <do many writes to file> file.close(); new RandomAccess(path, "rw").getFD().sync(); Are we pretty sure that all of the "many writes" will in fact be I haven't been able to find convincing evidence one way or another. I Robert do you know? I sure hope the answer is yes ... because if not, the alternative is Another nuance here is ... say we do a "soft commit" (write a new
segment & segments_N but do not sync the files), and, the machine crashes. This is fine because there will always be an earlier commit point (segments_M) that was a "hard commit" (sync was done). Then, machine comes back up and we open a reader. The reader sees We have retry logic in SegmentInfos to fallback to segments_M if we But, the problem is: the extent of the "corruption" caused by the I think this means when we do a "soft commit" we should not in fact The thing is, while we have been (and want to continue to be) vague From java-dev, Robert Engels wrote:
OK thanks Robert. I think very likely this approach (let's call it "sync after close") http://msdn2.microsoft.com/en-us/library/17618685 Also at least PostgreSQL and Berkeley DB "trust" _commit as the Though ... I am also a bit concerned about opening files for writing I think the alternative ("sync before close") is something like:
OK ... I'm leaning towards sticking with "sync after close", so I'll > I think this means when we do a "soft commit" we should not in fact
> write a new segments_N file (as we do today). +1 As long as we commit periodically when autoCommit=true I don't think we're breaking any previously advertised contract.
Michael McCandless made changes - 06/Dec/07 05:27 PM
I've moved this issue to 2.4. I think it's too risky to rush it in
just before 2.3 is released vs committing just after 2.3 and giving it more time on the trunk. But, I think for 2.3 we should revert the optional "doSync" argument
I think this would work too? FileInputStream fis = new FileInputStream(path);
fis.getFD().sync();
fis.close();
This was suggested & debated on the java-dev list. But, the man page ERRORS
EBADF fd is not a valid file descriptor open for writing.
And Yonik found at least one JVM implementation (I think Harmony) that I think we're walking on thin ice if we do that...
Oh, I skimmed too fast that part of the discussion in the Initial patch attached:
All tests now pass, but there is still alot to do, eg at least:
Michael McCandless made changes - 11/Dec/07 08:20 PM
Hi, Any progress on this issue?
I found sync call was removed from the source code. Is there an alternative to solve this problem? Thanks a lot! This is still in progress. It's clearly a serious bug since it's something out of your control that can easily cause index corruption.
sync was removed because the simple approach is far too costly on some IO systems. The new approach (sync only on committing a merge) has more reasonable performance, but is not quite done yet. New rev of this patch. All tests pass. I think it's ready to
commit, but I'll wait a few days for comments. This patch has a small change to the segments_N file: it adds a Unfortunately, in testing performance, I still see a sizable (~30-50%) So I tried sleeping, after writing and before syncing. I sleep based I think this must be because calling sync immediately forces the OS to It's disappointing to have to "game" the OS to gain back this On Linux 2.6.22 on a RAID5 array I still see a net performance cost of Other fixes:
Here are test details. I index first 200K Wikipedia docs with this analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.maker.forever = false { "BuildIndex" RepSumByPref BuildIndex Win2003 R64, JVM 1.6.0_03 Win XP Pro, laptop hard drive, JVM 1.4.2_15-b02 Linux ReiserFS on 6 drive RAID 5 array, JVM 1.5.0_08 Mac OS X 10.4 4-drive RAID 0 array, JVM 1.5.0_13
Michael McCandless made changes - 05/Feb/08 10:52 AM
On thinking through the above costs of committing, I now think we
should deprecate autoCommit=true entirely, making autocommit=false the only choice in 3.0. With that change, when you use an IndexWriter, its changes are never Here are some reasons:
What do people think? If we do this, I would right now deprecate all ctors that take > deprecate autoCommit=true entirely
+1 This sounds like a good plan. Are your performance numbers above with autoCommit true or false? Also, why not only sleep if Constants.WINDOWS?
OK I'll work out a new patch with this approach.
They were all with autoCommit=true.
Good, I'll take that approach! OK I updated the patch:
I'll wait until end of week to commit!
Michael McCandless made changes - 05/Feb/08 09:19 PM
Attached new rev of the patch. Only changes were to add caveats in javadcos about IO devices that ignore fsync, and, updated patch to apply cleanly on current trunk.
I plan to commit in a day or two.
Michael McCandless made changes - 08/Feb/08 08:31 PM
Michael McCandless made changes - 11/Feb/08 06:59 PM
Michael McCandless made changes - 11/Oct/08 12:49 PM
I had been following this thread. Just curious if the patch was committed.
Yes, this is committed and available as of 2.4.0.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
second: to quote myself from a recent thread regarding lucene and "kill -9" ...
http://www.nabble.com/Help-with-Lucene-Indexer-crash-recovery-tf4572570.html#a13068939
...what's true for "kill -9" is true for hanking the power cord ... if the JVM isn't shut down cleanly, there is nothing Lucene or the JVM can do to guarantee that your index is in a consistent state.