Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.1
-
None
-
None
-
New
Description
This is a spinoff of LUCENE-701
If your directory is residing on a "write once" filesystem (eg Hadoop), we need for Lucene to have a mode where it doesn't write to the same file more than once, nor (I think?) do things like rewind a file to overwrite parts of it.
Lockless commits (LUCENE-701 ) gets us closer to this goal because it always commits to a new segments_N+1 file (and new files for deletes/separate norms), but, it still re-writes to a "segments.gen" file. This file is often "optional" (it's only necessary if directory listing can be stale on the platform/filesystem).
The only other place I know of is in CompoundFileWriter.close(). That method writes 0's into the header and then rewinds and rewrites those 0s with the actual offsets into the compound file. I think (on quick inspection) that pre-computing the offsets and writing everything in one pass should be simple.
Does anyone know of other places that re-use filenames or rewind/seek and rewrite bytes?
We should create a "setWriteOnceMode()" or something like that.