Issue Details (XML | Word | Printable)

Key: LUCENE-710
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Michael McCandless
Reporter: Michael McCandless
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Implement "point in time" searching without relying on filesystem semantics

Created: 14/Nov/06 07:46 PM   Updated: 15/Mar/07 10:31 PM
Return to search
Component/s: Index
Affects Version/s: 2.1
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 710.review.diff 2007-03-15 10:08 AM Doron Cohen 25 kB
File Licensed for inclusion in ASF works 710.review.take2.diff 2007-03-15 12:14 PM Michael McCandless 26 kB
Text File Licensed for inclusion in ASF works LUCENE-710.patch 2007-03-02 03:52 PM Michael McCandless 162 kB
Text File Licensed for inclusion in ASF works LUCENE-710.take2.patch 2007-03-07 09:56 AM Michael McCandless 162 kB
Text File Licensed for inclusion in ASF works LUCENE-710.take3.patch 2007-03-09 09:44 AM Michael McCandless 163 kB

Lucene Fields: New
Resolution Date: 13/Mar/07 09:11 AM


 Description  « Hide
This was touched on in recent discussion on dev list:

http://www.gossamer-threads.com/lists/lucene/java-dev/41700#41700

and then more recently on the user list:

http://www.gossamer-threads.com/lists/lucene/java-user/42088

Lucene's "point in time" searching currently relies on how the
underlying storage handles deletion files that are held open for
reading.

This is highly variable across filesystems. For example, UNIX-like
filesystems usually do "close on last delete", and Windows filesystem
typically refuses to delete a file open for reading (so Lucene retries
later). But NFS just removes the file out from under the reader, and
for that reason "point in time" searching doesn't work on NFS
(see LUCENE-673 ).

With the lockless commits changes (LUCENE-701 ), it's quite simple to
re-implement "point in time searching" so as to not rely on filesystem
semantics: we can just keep more than the last segments_N file (as
well as all files they reference).

This is also in keeping with the design goal of "rely on as little as
possible from the filesystem". EG with lockless we no longer re-use
filenames (don't rely on filesystem cache being coherent) and we no
longer use file renaming (because on Windows it can fails). This
would be another step of not relying on semantics of "deleting open
files". The less we require from filesystem the more portable Lucene
will be!

Where it gets interesting is what "policy" we would then use for
removing segments_N files. The policy now is "remove all but the last
one". I think we would keep this policy as the default. Then you
could imagine other policies:

  • Keep past N day's worth
  • Keep the last N
  • Keep only those in active use by a reader somewhere (note: tricky
    how to reliably figure this out when readers have crashed, etc.)
  • Keep those "marked" as rollback points by some transaction, or
    marked explicitly as a "snaphshot".
  • Or, roll your own: the "policy" would be an interface or abstract
    class and you could make your own implementation.

I think for this issue we could just create the framework
(interface/abstract class for "policy" and invoke it from
IndexFileDeleter) and then implement the current policy (delete all
but most recent segments_N) as the default policy.

In separate issue(s) we could then create the above more interesting
policies.

I think there are some important advantages to doing this:

  • "Point in time" searching would work on NFS (it doesn't now
    because NFS doesn't do "delete on last close"; see LUCENE-673 )
    and any other Directory implementations that don't work
    currently.
  • Transactional semantics become a possibility: you can set a
    snapshot, do a bunch of stuff to your index, and then rollback to
    the snapshot at a later time.
  • If a reader crashes or machine gets rebooted, etc, it could choose
    to re-open the snapshot it had previously been using, whereas now
    the reader must always switch to the last commit point.
  • Searchers could search the same snapshot for follow-on actions.
    Meaning, user does search, then next page, drill down (Solr),
    drill up, etc. These are each separate trips to the server and if
    searcher has been re-opened, user can get inconsistent results (=
    lost trust). But with, one series of search interactions could
    explicitly stay on the snapshot it had started with.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Michael McCandless added a comment - 19/Jan/07 03:09 PM
There has been some great design discussions / iterations recently
on how to approach this:

http://www.gossamer-threads.com/lists/lucene/java-dev/44162

http://www.gossamer-threads.com/lists/lucene/java-dev/44236

I think we've iterated to a good approach now. Here's the summary:

  • First, add an option to IndexWriter to "commit (write segments_N)
    only on close" vs writing a segments_N every time there is a
    flush, merge, etc., during a single IndexWriter session.

This means a reader won't see anything a writer has been doing
until it's closed.

We would still have an "autoCommit" true/false (default true) to
keep backwards compatibility. If true, the IndexWriter writes a
new segments_N every time it flushes, merges segments, etc.; else
it only writes one on close.

We would add an "abort()" to IndexWriter to not commit, clean up
any temp files created, and rollback.

"Commit on close" will also address / enable fixes for other
issues like prevent readers from refreshing half way through
something like "bulk delete then bulk add", preventing readers
from refreshing during optimize() thus tying up lots of disk
space, enabling a write session to be transactional (all or
none), etc.

  • Second, change how IndexFileDeleter works: have it keep track of
    which commits are still live and which one is pending (as the
    SegmentInfos in IndexWriter, not yet written to disk).

Allow IndexFileDeleter to be subclassed to implement different
"deletion policies".

The base IndexFileDeleter class will use ref counts to figure out
which individual index files are still referenced by one or more
"segments_N" commits or by the uncommitted "in-memory"
SegmentInfos. Then the policy is invoked on commit (and also on
init) and can choose which commits (if any) to now remove.

Add constructors to IndexWriter allowing you to pass in your own
deleter. The default policy would still be "delete all past
commits as soon as a new commit is written" (this is how deleting
happens today).

For NFS we can then try different policies as discussed on those
threads above (there were at least 4 proposals). They all have
different tradeoffs. I would open separate issues for these
policies after this issue is resolved.


Marvin Humphrey added a comment - 19/Jan/07 06:25 PM
(This is a continuation of the discussion from one of the threads quoted in the previous comment, with some summations to provide context.)

Referring to a proposal to implement advisory locking where possible, and prevent index creation on volumes/systems where the delete mechanism fails, Michael McCandless wrote:

But what I don't like about it is it doesn't "gracefully degrade" to
the common NFS case where locking does not work. And, this is often
outside our user's control.

The stratagem of publicly exposing IndexFileDeleter does not "degrade gracefully", either.

If today's default deletions policy remains the default, then when an index is put on an NFS system, at some point search-time exceptions will occur after an update to an index. The user has to 1) detect this scenario without Lucene's help, probably from logs or user complaints 2) diagnose it without the aid of a meaningful error message, and 3) either implement their own IndexFileDeleter or choose an appropriate provided subclass, a task which will require that they grok both the innards of Lucene and the subtleties of NFS.

The first step to graceful degradation is a meaningful error message. That means detecting a problem which normally requires both a indexing process and a search process to trigger, so we have to simulate it artificially with a test.

1) Open a file for read/write and write a few bytes to it.
2) Delete the test file (without closing the handle).
3) Try to read from the handle and if a "stale NFS filehandle"
exception is caught, throw something more informative.

Our first opportunity to perform this test occurs at index-creation time. This is essentially cost free.

A second opportunity arises whenever deletions are performed. Here, there's a small cost involved, and it may not be worth it, as this would only catch cases where an index was copied onto an NFS volume rather than created there, then subsequently modified.

There's also another stratagem available to us: we can have IndexReaders establish advisory read locks against their relevant segments_N files on systems which support such locks. This won't help us to "degrade gracefully". However, it will help us degrade less often, as modern versions of NFS support file locking - but still do NOT support the "delete-on-last-close" mechanism Lucene depends on.

Implementing advisory read locks is orthogonal to the addition of IndexFileDeleter, but diminishes the justification for adding it as a public API.

But the good news is since we will allow subclassing to make your own
deletion policy, we can eventually do both of these approaches and our
users can pick one or do their own.

The number of users this class will serve is diminishingly small. Other mitigation strategies are available.

1) If we implement advisory read locks, many people who see this error will no longer see it. For those who do, the best option is to upgrade the OS to a version which supports advisory locks over NFS. Then an index on an NFS volume will behave as any other.
2) If you don't actually need to put the index on an NFS volume, put it somewhere else.
3) Catch stale NFS filehandle exceptions in your search application and refresh the reader when they occur.
4) Maintain two copies of an index and do an rsync/switch.
5) Hack Lucene.

Flexibility is not free. There have been recent lamentations on java-dev about how difficult it will be to merge the write interfaces of IndexReader and IndexWriter to provide a single, unified class through which all index modifications can be performed. The exposure of the IndexFileDeleter mechanism contributes to this problem – it's one more small step in the wrong direction.

Providing a subclassing/callback API is often an elegant strategy, and it is surely better in this case than it would be to provide a list of deletion policies for the user to select from. However, whenever possible, no API is always a better solution – especially in a case like this one, where the functionality provided has nothing to do with Lucene's core mission and is there solely to work around an implmentation-specific bug.


Marvin Humphrey added a comment - 19/Jan/07 06:33 PM
There are two sections in the previous comment that are supposed to be quoted, but which are not because JIRA ate the email-style "greater than" quoting. They are:

"But what I don't like about it is it doesn't "gracefully degrade" to
the common NFS case where locking does not work. And, this is often
outside our user's control."

"But the good news is since we will allow subclassing to make your own
deletion policy, we can eventually do both of these approaches and our
users can pick one or do their own."

I wish it were possible to edit that note, as it's really confusing as things stand. A preview mechanism would have been handy.


Doron Cohen added a comment - 19/Jan/07 06:55 PM
> * Second, change how IndexFileDeleter works: have it keep track of
> which commits are still live and which one is pending (as the
> SegmentInfos in IndexWriter, not yet written to disk).
>
> Allow IndexFileDeleter to be subclassed to implement different
> "deletion policies".
>
> The base IndexFileDeleter class will use ref counts to figure out
> which individual index files are still referenced by one or more
> "segments_N" commits or by the uncommitted "in-memory"
> SegmentInfos. Then the policy is invoked on commit (and also on
> init) and can choose which commits (if any) to now remove.
>
> Add constructors to IndexWriter allowing you to pass in your own
> deleter. The default policy would still be "delete all past
> commits as soon as a new commit is written" (this is how deleting
> happens today).
>
> For NFS we can then try different policies as discussed on those
> threads above (there were at least 4 proposals). They all have
> different tradeoffs. I would open separate issues for these
> policies after this issue is resolved.
>

This ties solving the NFS issue with an extendable-file-deletion policy.
I am wondering is this the right way, or, perhaps, should the reference
counting be considered alone, apart from the deletion policy.
(Would modifying IndexFileDeleter to base on ref-count make it simpler
or harder to maintain?)

Also, IndexFileDeleter is doing delicate work - not sure you want
applications to mess with it. Better let applications control some
simple well defined behavior, maybe the same way that a sorter
allows applications to provide a comparator, but keeps the sorting
algorithm for itself.

Back to reference counting,- how about the following approach:

  • Add to Directory a FileReferenceCounter data member, get()/set() etc.
  • Add a class FileReferenceCounter with simple general methods:
    void increment (String name)
    void decrement (String name)
    int getRefCount (String name)
  • Default implementation would do nothing, i.e. would not record
    references, and always return 0.
  • IndexReader, upon opening a segment, would call increment(segName)
  • IndexReader, upon closing a segment, would call decrement(segName)
  • IndexFileDeleter, before removing a file belonging to a certain segment,
    would verify getRefCount(segName)==0.
  • Notice that the FilereferenceCounter is available from the Directory,
    so no constructors should be added to IndexWriter/Reader.

So, this is adding to Directory a general file utility, no knowledge of
index structure required in Directory. Also, IndexFileDeleter can remain
as today, and at some later point can be made more powerful with various
deletion policies - but those policies remain unrelated to the NFS
issue - they can focus on point-in-time issues, where I think it
stemmed from.

An NFS geared FileReferenceCounter would then be able to keep alive
"counter files", name those files based on counted fileName plus
processID plus machID, base getRefCount on safety window since file
was last touched, etc. All this is left out from point-in-time
policies (how many/time points-in-time should be retained).


Michael McCandless added a comment - 19/Jan/07 10:03 PM
OK, a few top level summary comments and then some specifics below:
  • I don't want to modify the Lucene core (adding advisory read
    locks, etc) just to handle the NFS case. That risks hurting the
    non-NFS cases. "First do no harm." "Keep it simple." We've been
    working hard to remove locking lately (lockless commits) and I
    rather not add more locking back.

By implementing each approach (I think there are now 5 different
ideas now) instead as its own deletion policy (subclass of
IndexFileDeleter) we contain the added complexity of locking,
file-based ref counts, etc, to just that one subclass of
IndexFileDeleter.

  • I think NFS support is part of Lucene's core mission.

Lucene should try hard to be as portable as possible.

NFS is used alot.

It's tempting to tell users to "upgrade OS", "upgrade NFS server
and/or client", etc, but taking that approach will only hurt our
users because typically this is not something they can control.

Now, if we had to bend over backwards for NFS, then I would agree
it's not worth it. But, we don't have to: by allowing custom
deletion policies (which is a minor change) we can try out all of
the approaches suggested so far on this thread.

Rather than baking any one of these approaches into the Lucene
core, I'd rather just enable "custom deletion policies" then
people can build out these polices outside of the core (eg in
"contrib" first).

  • I agree that "giving a good error message when index is on NFS" is
    really important and that custom deletion policy alone doesn't
    address this.

Marvin I don't think your test will work either (see below).

But I really like this direction: does anyone know how (Java
friendly way) to determine that a given directory is on an NFS
mount? That would be wonderful. I will spin off a new thread
here.

Some specifics below:

Marvin Humphrey wrote:

> The first step to graceful degradation is a meaningful error message.
That means detecting a problem which normally requires both a indexing
process and a search process to trigger, so we have to simulate it
artificially with a test.
>
> 1) Open a file for read/write and write a few bytes to it.
> 2) Delete the test file (without closing the handle).
> 3) Try to read from the handle and if a "stale NFS filehandle"
> exception is caught, throw something more informative.
>
> Our first opportunity to perform this test occurs at index-creation
time. This is essentially cost free.
>
> A second opportunity arises whenever deletions are performed. Here,
there's a small cost involved, and it may not be worth it, as this would
only catch cases where an index was copied onto an NFS volume rather
than created there, then subsequently modified.

I think this test won't work (though I haven't tested...). Typically
an NFS client will catch this case and locally emulate "delete on last
close". Worse, even if it doesn't, those bytes would likely be cached
and then would fail to hit "stale NFS filehandle".

> But the good news is since we will allow subclassing to make your own
> deletion policy, we can eventually do both of these approaches and our
> users can pick one or do their own.
>
> The number of users this class will serve is diminishingly small.
Other mitigation strategies are available.
>
> 1) If we implement advisory read locks, many people who see this
error will no longer see it. For those who do, the best option is to
upgrade the OS to a version which supports advisory locks over NFS.
Then an index on an NFS volume will behave as any other.
> 2) If you don't actually need to put the index on an NFS volume, put
it somewhere else.
> 3) Catch stale NFS filehandle exceptions in your search application
and refresh the reader when they occur.
> 4) Maintain two copies of an index and do an rsync/switch.
> 5) Hack Lucene.

5) isn't really a good option since we all can't even agree how to
"hack Lucene" to make this work! 1) I think is too dangerous as part
of the core. 2) typically this is not an option. People choose NFS
because they want to share the index. 4) is a fair amount of added
complexity. 3) is the most viable option I see here, but it's not
great because you're forced to refresh "right now". What if warming
takes 8 minutes? What if "now" is a bad time because deletes were
done by the writer but not yet adds?

> Flexibility is not free. There have been recent lamentations on
java-dev about how difficult it will be to merge the write interfaces of
IndexReader and IndexWriter to provide a single, unified class through
which all index modifications can be performed. The exposure of the
IndexFileDeleter mechanism contributes to this problem – it's one more
small step in the wrong direction.

Yes there is an open question now on what to do about the confusion on
using IndexReader vs IndexWriter. I think moving towards "use
IndexWriter for changes, use IndexReader for reading" is the best
solution here. But I don't see how this relates to allowing
subclassing of IndexFileDeleter to make your own deletion policy.

> Providing a subclassing/callback API is often an elegant strategy,
and it is surely better in this case than it would be to provide a list
of deletion policies for the user to select from. However, whenever
possible, no API is always a better solution – especially in a case
like this one, where the functionality provided has nothing to do with
Lucene's core mission and is there solely to work around an
implmentation-specific bug.

I disagree on this point ("no" API is better than subclassing). As
you've said, this issue won't affect that many people (though I think
it's a fairly large subset of our users). Given that, I would not
want to add file locking & additional complexity into the Lucene core,
just to handle NFS.

By allowing a different delete policy as a subclass of
IndexFileDeleter we keep the changes required for supporting NFS way
outside the Lucene core. Since there's so much debate about which
deletion policy is best we should create all of these in contrib to
begin with and if something proves reliable we can eventually promote
it into core Lucene.

I think subclassing is perfect for this sort of situation. It's like
the various LockFactory implementations we have: there is no "one size
fits all".

Mike


Michael McCandless added a comment - 19/Jan/07 10:09 PM

Doron Cohen wrote:

> This ties solving the NFS issue with an extendable-file-deletion policy.
> I am wondering is this the right way, or, perhaps, should the reference
> counting be considered alone, apart from the deletion policy.
> (Would modifying IndexFileDeleter to base on ref-count make it simpler
> or harder to maintain?)
>
> Also, IndexFileDeleter is doing delicate work - not sure you want
> applications to mess with it. Better let applications control some
> simple well defined behavior, maybe the same way that a sorter
> allows applications to provide a comparator, but keeps the sorting
> algorithm for itself.

The solution I have in mind abstracts away all tricky details of
deleting files. EG something like:

public class OnlyLastCommitDeleter extends IndexFileDeleter {

void onInit(List commits) { onCommit(commits); }

void onCommit(List commits) {
if (commits.size() > 1) {
for(int i=0;i<commits.size()-1;i++) { deleteCommit(commits.get(i)); }
}
}

Ie, the sole responsibility of the IndexFileDeleter subclass (policy)
is to decide when to delete a commit. The rest of the details
(figuring out what actual files can be deleted now that a given commit
segments_N is deleted) are handled by the base class (with in-memory
ref counting).

> Back to reference counting,- how about the following approach:
> - Add to Directory a FileReferenceCounter data member, get()/set() etc.
> - Add a class FileReferenceCounter with simple general methods:
> void increment (String name)
> void decrement (String name)
> int getRefCount (String name)
> - Default implementation would do nothing, i.e. would not record
> references, and always return 0.
> - IndexReader, upon opening a segment, would call increment(segName)
> - IndexReader, upon closing a segment, would call decrement(segName)
> - IndexFileDeleter, before removing a file belonging to a certain segment,
> would verify getRefCount(segName)==0.
> - Notice that the FilereferenceCounter is available from the Directory,
> so no constructors should be added to IndexWriter/Reader.
>
> So, this is adding to Directory a general file utility, no knowledge of
> index structure required in Directory. Also, IndexFileDeleter can remain
> as today, and at some later point can be made more powerful with various
> deletion policies - but those policies remain unrelated to the NFS
> issue - they can focus on point-in-time issues, where I think it
> stemmed from.
>
> An NFS geared FileReferenceCounter would then be able to keep alive
> "counter files", name those files based on counted fileName plus
> processID plus machID, base getRefCount on safety window since file
> was last touched, etc. All this is left out from point-in-time
> policies (how many/time points-in-time should be retained).

I think this approach could work, but, rather than implementing in the
Lucene core (adding methods to Directory) I'd like to see it tested as
a custom deletion policy + wrappers around IndexReader
creation/destruction.

We have so much debate about the best "deletion policy" for NFS that
I'd like to make the minimal extension to the core (ability to make
your own "deletion policy") and then people can build their own and
try them out.

Mike


Michael McCandless added a comment - 20/Jan/07 10:12 AM
One clarification on "different deletion policies": to support "commit
on close" in IndexWriter, I already have to improve IndexFileDeleter
to give it a different deletion policy than the current one.

Specifically, the deleter must not delete anything referenced by the
last commit nor anything referenced by the in-memory SegmentInfos.

For example, if a writer is opened with autoCommit=false ("commit on
close") on an existing index, and lots of docs are added/deleted/etc,
there will have been flushes & merges of segments. The deletion
policy should not delete anything that existed "at the start" because
it's referenced by the segments_N commit, nor anything that is now
referenced by the in-memory SegmentInfos. But it should delete
anything "in between" (any newly written segments that have now been
merged away).

To the deleter this would just be a different policy, one that keeps
two SegmentInfos alive (one on disk and one not yet committed, in
memory). And the default deletion policy would be selected depending
on whether the writer is in autoCommit mode or not.


Marvin Humphrey added a comment - 22/Jan/07 01:22 AM

On Jan 19, 2007, at 2:04 PM, Michael McCandless (JIRA) wrote:
> * I think NFS support is part of Lucene's core mission.

When I asserted that IndexFileDeleter had nothing to do with Lucene's core
mission, I meant: you don't use Lucene to build yourself an app which helps
you delete files.

> Yes there is an open question now on what to do about the confusion on using
> IndexReader vs IndexWriter. I think moving towards "use IndexWriter for
> changes, use IndexReader for reading" is the best solution here. But I
> don't see how this relates to allowing subclassing of IndexFileDeleter to
> make your own deletion policy.

They're hard to refactor because they're both big, hence adding either code or
API commitments to them should be avoided when possible. We're in agreement
about the desirability of simplicity. We part ways in how we measure
simplicity: I give greater emphasis to simplicity of API design.

> I disagree on this point ("no" API is better than subclassing).

We're talking past each other. I was generalizing: a 100% successful, purely
internal "black box" solution is always better than a solution that involves
the user.

> I would not want to add file locking & additional complexity into the Lucene
> core, just to handle NFS.

This is where our differing priorities manifest. I would rather add some
code complexity to the Lucene "core" than accept the increased
support burden of an expanded API.

Ironically, though you consider supporting NFS "part of Lucene's core
mission", for the average user your proposal as it stands is not very
user-friendly. People like Chuck, Doron, and Robert will have no trouble with
it, but if you're a newcomer to Lucene and you "just want to put an index on
NFS", subclassing IndexFileDeleter will pose a challenge.

I also think you may be over-estimating the amount of effort it will take to
exploit advisory read locks. (The vexing problem of how to issue a warning
when index creation is attempted on an NFS volume is orthogonal to the
read-locks approach as well.) They should be easy in KS; I'll know soon enough.
However, there are some OO discipline issues which complicate applying what I
have in mind to Java Lucene. In KS, the public API is defined solely via
documentation, so I can have code in Index call methods from Store without
having to expose it. With Lucene divided into multiple packages, that's a
problem.

> I think subclassing is perfect for this sort of situation.

I'm not so enamored of subclassing. It's less constraining than some other
approaches, but it's still constraining.

Case in point: it's not possible to provide a subclass of IndexFileDeleter
which exploits advisory read locking under your current proposal.

In theory, your proposal even prevents the addition of such read locks to
Lucene later, because doing so could conflict with a deletions policy you've
allowed the user to set. (; Given that locks over NFS make you "nervous",
perhaps you consider foreclosing that option a feature.

> It's like the various LockFactory implementations we have: there is no "one
> size fits all".

I don't think LockFactory ought to be exposed either.

Reading from an index – any index, on any volume – should Just Work.
Writing to an index from a single IndexWriter should Just Work. In a perfect
world, writing from multiple writers simultaneously would Just Work, too, but
as that's inherently impractical given Lucene's current design, opening a
second writer must fail. That failure should be the only visible evidence
that a locking mechanism even exists.

In my view, any deviance from that ideal API due to implementation defects
should be handled with exceptions rather than API additions.

In keeping with this philosophy, Lock is not publicly exposed in KinoSearch.
In fact, nothing about the locking mechanism is publicly exposed. So far,
there seem to be three bugs with the current implementation:

  • Stale NFS Filehandle exceptions.
  • Stale lock files interfering with unattended indexing sessions. I plan
    to mitigate this by moving to advisory write locks when possible.
  • Multiple machines can cause index corruption when attempting to write
    simultaneously to a shared volume. Moving the write.lock file to the
    index directory, as enabled by lockless commits, solves this problem.

Once advisory file locks are in place, and if they work as expected under
recent versions of NFS, I expect no further problems under any common, recent
Unix.

With the switch to lockless commits in KinoSearch, I've able to refactor Lock
and eliminate all of Lock's subclasses, simplifying the KinoSearch "core".
"No more subclassing of Lock" was originally a line-item in my list of
"great stuff" about lockless commits, but I had to take it out because it
wasn't true for Lucene!

With Otis signing on to your solution, it looks like momentum is gathering for
it. For the record, I don't think it's a catastrophic change, just suboptimal
and IMO not ready for addition until improved.

I think you can do better.


Michael McCandless added a comment - 22/Jan/07 11:30 PM
Quick summary first:

OK, as you said (and I agree) I think we just have a difference of
opinion on what's the "lesser evil" tradeoff here. You would prefer to
change the core of KinoSearch to always use advisory read locks for
all indices.

Whereas I prefer to leave the Lucene core untouched since things work
fine in most cases today ("if it ain't broke don't fix it"), and then
open up an API so for those cases (NFS) where it doesn't work, users
at least have possible solutions to try.

I think you also have a high confidence that the locking approach will
work fine (be perfect) on the first go and will not alienate too many
users, but I don't: I have had problems with locking in the past and I
think most users don't have the freedom to "upgrade OS/fileserver".

So I would prefer instead to open a minimal API in the core of Lucene
(so users can use different deletion policies), and then try the 5
different ideas proposed so far (and more later I'm sure) as their own
deletion policy, external to Lucene's core (eg in contrib). If one of
them proves to work well, universally, then sometime down the road we
can promote it as the default deletion policy.

OK details below:

> > * I think NFS support is part of Lucene's core mission.
>
> When I asserted that IndexFileDeleter had nothing to do with Lucene's core
> mission, I meant: you don't use Lucene to build yourself an app which helps
> you delete files.

Well, "custom deletion policies" is in support of the core mission of
"working over NFS".

> > Yes there is an open question now on what to do about the confusion on using
> > IndexReader vs IndexWriter. I think moving towards "use IndexWriter for
> > changes, use IndexReader for reading" is the best solution here. But I
> > don't see how this relates to allowing subclassing of IndexFileDeleter to
> > make your own deletion policy.
>
> They're hard to refactor because they're both big, hence adding either code or
> API commitments to them should be avoided when possible. We're in agreement
> about the desirability of simplicity. We part ways in how we measure
> simplicity: I give greater emphasis to simplicity of API design.
>
> > I disagree on this point ("no" API is better than subclassing).
>
> We're talking past each other. I was generalizing: a 100% successful, purely
> internal "black box" solution is always better than a solution that involves
> the user.

OK, yes in the ideal case, no API is better than API if your situation
allows for no API. I just don't think this is one of those
situations: I don't think we have a clear cut "one size fits all"
solution.

> > I would not want to add file locking & additional complexity into the Lucene
> > core, just to handle NFS.
>
> This is where our differing priorities manifest. I would rather add some
> code complexity to the Lucene "core" than accept the increased
> support burden of an expanded API.
>
> Ironically, though you consider supporting NFS "part of Lucene's core
> mission", for the average user your proposal as it stands is not very
> user-friendly. People like Chuck, Doron, and Robert will have no trouble with
> it, but if you're a newcomer to Lucene and you "just want to put an index on
> NFS", subclassing IndexFileDeleter will pose a challenge.

Yes, but at least having the option (picking one of the deletion
policies in "contrib" once we've built them out) is quite a bit better
than what we have today (no option besides "you must refresh now"). I
would love to have the "perfect" solution (which you are aiming for in
one step), but I'll settle today for just good progress: "progress not
perfection".

> I also think you may be over-estimating the amount of effort it will take to
> exploit advisory read locks. (The vexing problem of how to issue a warning
> when index creation is attempted on an NFS volume is orthogonal to the
> read-locks approach as well.) They should be easy in KS; I'll know soon enough.
> However, there are some OO discipline issues which complicate applying what I
> have in mind to Java Lucene. In KS, the public API is defined solely via
> documentation, so I can have code in Index call methods from Store without
> having to expose it. With Lucene divided into multiple packages, that's a
> problem.

Yes detection of NFS is orthogonal and I would love to find a solution
here. And yes Java's method/field protection is quite different from
what KS can do.

> > I think subclassing is perfect for this sort of situation.
>
> I'm not so enamored of subclassing. It's less constraining than some other
> approaches, but it's still constraining.
>
> Case in point: it's not possible to provide a subclass of IndexFileDeleter
> which exploits advisory read locking under your current proposal.
>
> In theory, your proposal even prevents the addition of such read locks to
> Lucene later, because doing so could conflict with a deletions policy you've
> allowed the user to set. (; Given that locks over NFS make you "nervous",
> perhaps you consider foreclosing that option a feature.

No, I would not consider foreclosing that option a feature!

Yes, I am nervous about relying on advisory read locks 100% today in
the Lucene core. But, I would love to be proven wrong in the future:
if your lock based solution actually works out "perfectly" in the
future then users can indeed fire up Lucene/KS regardless of what
filesystem the index is on. I would equally love to see file-based
reference counting work out, etc: if any option proves reliable enough
in the future then we can make it the default deletion policy.

Yes users who set their own deletion policies would not get this
default but that's OK: such users understand what they've done.

And, I don't want to change the default policy now ("first do no
harm").

> > It's like the various LockFactory implementations we have: there is no "one
> > size fits all".
>
> I don't think LockFactory ought to be exposed either.
>
> Reading from an index – any index, on any volume – should Just Work.
> Writing to an index from a single IndexWriter should Just Work. In a perfect
> world, writing from multiple writers simultaneously would Just Work, too, but
> as that's inherently impractical given Lucene's current design, opening a
> second writer must fail. That failure should be the only visible evidence
> that a locking mechanism even exists.
>
> In my view, any deviance from that ideal API due to implementation defects
> should be handled with exceptions rather than API additions.
>
> In keeping with this philosophy, Lock is not publicly exposed in KinoSearch.
> In fact, nothing about the locking mechanism is publicly exposed. So far,
> there seem to be three bugs with the current implementation:
>
> * Stale NFS Filehandle exceptions.
> * Stale lock files interfering with unattended indexing sessions. I plan
> to mitigate this by moving to advisory write locks when possible.
> * Multiple machines can cause index corruption when attempting to write
> simultaneously to a shared volume. Moving the write.lock file to the
> index directory, as enabled by lockless commits, solves this problem.
>
> Once advisory file locks are in place, and if they work as expected under
> recent versions of NFS, I expect no further problems under any common, recent
> Unix.
>
> With the switch to lockless commits in KinoSearch, I've able to refactor Lock
> and eliminate all of Lock's subclasses, simplifying the KinoSearch "core".
> "No more subclassing of Lock" was originally a line-item in my list of
> "great stuff" about lockless commits, but I had to take it out because it
> wasn't true for Lucene!
>
> With Otis signing on to your solution, it looks like momentum is gathering for
> it. For the record, I don't think it's a catastrophic change, just suboptimal
> and IMO not ready for addition until improved.

I agree it would be great to reach this perfect world. It would be
even better to get there in just one jump from where we are today.
It's just not nearly clear to me that a locking solution (or reference
counting, time based expiration, etc.) for NFS is or will evolve to
that pefect solution. And I think "not alienating users" who are
stuck on past UNIX versions is more important than "not adding any
API". I think we are just picking a different "lesser evil".

> I think you can do better.

With time, I hope so too. Progress not perfection!


Doron Cohen added a comment - 23/Jan/07 04:25 AM
Michael McCandless wrote:

> The solution I have in mind abstracts away all tricky details of
> deleting files. EG something like:
>
> public class OnlyLastCommitDeleter extends IndexFileDeleter {
>
> void onInit(List commits) { > onCommit(commits); > }
>
> void onCommit(List commits) {
> if (commits.size() > 1) {
> for(int i=0;i<commits.size()-1;i++) { > deleteCommit(commits.get(i)); > }
> }
> }
>
> Ie, the sole responsibility of the IndexFileDeleter subclass (policy)
> is to decide when to delete a commit. The rest of the details
> (figuring out what actual files can be deleted now that a given commit
> segments_N is deleted) are handled by the base class (with in-memory
> ref counting).
>

I don't really understand this interface and so I cannot see how
you intend to rewrite the IndexFileDeleter as you describe, but I
agree that if this can be done it is a better solution. So I am
okay with waiting for this approach to mature into code.

(I would prefer the DeletionPolicy to be a
pluggable interface and the IndexFileDeleter to be
an internal class, so that at least we do not expose now something
that would stand in our way in the future. But again, since I do not
fully understand your solution maybe please bear with me if this is
not making sense.)


Marvin Humphrey added a comment - 23/Jan/07 03:09 PM
I found a flaw in my plan. If the read locks are always applied, they will
slow deletion of obsolete segments for everybody where delete-on-last-close
currently works as intended. Right now, the files are unlinked and disappear
as soon as the last reader holding them open goes away. With read locks, the
unlink op wouldn't take place if a reader was open.

I also spent a good bit of time yesterday further researching the subtleties
of locks over NFS. Summing up: flock can work, but dot-lock files are more
reliable.

So, new proposal:

Add a new public method IndexReader.aquireReadLock(String hostId), which would
write a dot-lock file to the index directory with hostId, a pid, and an
incrementing integer spliced into the file name. The relevant segments_N file
name would be written to the lockfile. Calling it would only be necessary on NFS,
and an exception would occur if the attempt to create the lockfile
failed.

The deletions policy would work as in my earlier proposal, and the user
wouldn't have to grok its innards. Troubleshooting stale lockfiles
by snooping the index directory contents would be straightforward and
intuitive.

We might want aquireReadLock() to automatically zap any locks associated with
hostId for which the pid couldn't be found, or we might want to break that out
into another method.


Michael McCandless added a comment - 23/Jan/07 10:00 PM
> I don't really understand this interface and so I cannot see how you
> intend to rewrite the IndexFileDeleter as you describe, but I agree
> that if this can be done it is a better solution. So I am okay with
> waiting for this approach to mature into code.

The deletion policy is called on creation of a writer (onInit) and
once per commit (onCommit) and is given a List of existing commits (=
SegmentInfos instances) in the index. The policy then decides which
commits should be removed and IndexFileDeleter translates that request
(using reference counting, because a given index file may still be
reference by commits that are not yet deleted) into which specific
files to remove.

For example, onCommit you would typically see a List of length 2: the
prior commit and the new one. And the default policy
(KeepOnlyLastCommit) would at this point remove the prior one.

Realize that the "commit on close" mode (autoCommit=false) for
IndexWriter (that I'm doing as part of this issue) actually keeps 2
SegmentInfos alive at any given time: first is the segments_N file in
the index, and second is the "in memory" SegmentInfos that haven't yet
been committed to a segments_N file. It's only on close when the
commit takes place that the deleter then deletes the previous
segments_N commit.

> (I would prefer the DeletionPolicy to be a pluggable interface and
> the IndexFileDeleter to be an internal class, so that at least we
> do not expose now something that would stand in our way in the
> future. But again, since I do not fully understand your solution
> maybe please bear with me if this is not making sense.)

Good point: I agree an interface here is cleaner. I will use an
interface (not subclass) and make IndexFileDeleter entirely internal.
The deletion policy doesn't need to see any details of the
IndexFileDeleter class.


Michael McCandless added a comment - 23/Jan/07 10:18 PM

> I found a flaw in my plan. If the read locks are always applied,
> they will slow deletion of obsolete segments for everybody where
> delete-on-last-close currently works as intended. Right now, the
> files are unlinked and disappear as soon as the last reader holding
> them open goes away. With read locks, the unlink op wouldn't take
> place if a reader was open.

Ahh good point. This is why I don't want to risk changes to Lucene
core: most of the time Lucene's "point in time" searching works
perfectly now. It's just NFS (so far) that's problematic which is why
I want to keep the solution "external" to Lucene by allowing custom
deletion policies. Plus we obviously have alot of deletion policies
to try on NFS. First do no harm.

> I also spent a good bit of time yesterday further researching the
> subtleties of locks over NFS. Summing up: flock can work, but
> dot-lock files are more reliable.

Well, dot-locks (= "create file exclusively") have problems too.
There have been issues with bugs in at least certain Linux NFS
clients. And Sun's Javadocs on the equivalent Java method,
File.createNewFile, has a warning about not relying on this for
locking:

http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile()

This warning is why we created the NativeFSLockFactory for Directory
locking in the first place.

> So, new proposal:
>
> Add a new public method IndexReader.aquireReadLock(String hostId),
> which would write a dot-lock file to the index directory with
> hostId, a pid, and an incrementing integer spliced into the file
> name. The relevant segments_N file name would be written to the
> lockfile. Calling it would only be necessary on NFS, and an
> exception would occur if the attempt to create the lockfile failed.
>
> The deletions policy would work as in my earlier proposal, and the
> user wouldn't have to grok its innards. Troubleshooting stale
> lockfiles by snooping the index directory contents would be
> straightforward and intuitive.
>
> We might want aquireReadLock() to automatically zap any locks
> associated with hostId for which the pid couldn't be found, or we
> might want to break that out into another method.

OK. You could implement this in Lucene as a custom deletion policy
once we get this commmitted (I think this is 6 proposals now for
"deletion policy" for NFS), plus a wrapper around IndexReader.


Marvin Humphrey added a comment - 24/Jan/07 05:55 AM
On Jan 23, 2007, at 2:19 PM, Michael McCandless (JIRA) wrote:

> First do no harm.

If that was really your guiding philosophy, you would never change anything.

> And Sun's Javadocs on the equivalent Java method, File.createNewFile, has a
> warning about not relying on this for locking:
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile()

That page recommends that you use FileLock instead, which maps to Fcntl on
some systems. The FreeBSD manpage on Fcntl uses less delicate language than
Sun in pointing out the drawbacks:

This interface follows the completely stupid semantics of System V and
IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated
with a file for a given process are removed when any file descriptor for
that file is closed by that process. This semantic means that applica-
tions must be aware of any files that a subroutine library may access.

Trying to guarantee that kind of discipline from library code severely limits
your options.

> This warning is why we created the NativeFSLockFactory for Directory locking
> in the first place.

Take a look at this bug, which explains how that warning got added.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4676183

Read the comment below – the problem with the "protocol" they warn you
against using is with deleteOnExit(), not createNewFile(). I think you're
better off with dot-locks.

> OK. You could implement this in Lucene as a custom deletion policy once we
> get this commmitted (I think this is 6 proposals now for "deletion policy"
> for NFS), plus a wrapper around IndexReader.

This was the response I got on the KinoSearch list:

We do not enable NFS writes, only reads (which is why Slashdot is able to
reliably use NFS for its heavy load . So I don't think that will work,
if I understand you correctly.

Lack of bulletproof support for NFS ain't gonna hold up my next release any
longer. What a freakin' nightmare...


Doron Cohen added a comment - 24/Jan/07 07:00 AM
> The deletion policy is called on creation of a writer (onInit) and
> once per commit (onCommit) and is given a List of existing commits (=
> SegmentInfos instances) in the index. The policy then decides which
> commits should be removed and IndexFileDeleter translates that request
> (using reference counting, because a given index file may still be
> reference by commits that are not yet deleted) into which specific
> files to remove.
> ...

Michael thanks for explaining this further - yes, now it makes sense to me.


Michael McCandless added a comment - 02/Mar/07 03:52 PM
OK, I've attached a patch to implement "commit on close" and "custom
deletion policies". The design is exactly what's described above.

There are no changes to the file format.

All tests pass and I've added additional tests for this new
functionality.

Summary of the external changes:

  • For "commit on close":
  • Added new IndexWriter constructors that take "autoCommit"
    boolean: if it's false, then readers will not see any actions
    done by this writer (no new segments_N is written) until
    writer.close() is called.
  • Added IndexWriter.abort() which closes the writer without
    committing, cleaning up any temp files it had added to the
    index.
  • For "custom deletion policies":
  • Created IndexDeletionPolicy interface and added constructors to
    IndexReader/IndexWriter allowing you to specify a deletion
    policy.
  • Created IndexCommitPoint interface: this is passed to the
    deletion policy to represent each commit. The policy calls the
    delete method on this interface to remove a commit.
  • Created one deletion policy (KeepOnlyLastCommitDeletionPolicy)
    and made that the default policy. (The unit test for this has
    other "interesting" policies like "delete by age since this
    commit was obsoleted" initially discussed on java-dev.)

Summary of internal changes:

  • Created "files()" method in SegmentInfo (and changed
    SegmentReader.files() to use it).
  • Changed IndexFileDeleter to use reference counting to keep track
    of which files are deletable because no commit(s) (nor the
    in-memory SegmentInfos) reference them.

This is a nice simplification of IndexFileDeleter: previously it
had detailed knowledge about which files, extensions, etc., to
look for and delete. Now it has far less of that because it
relies entirely on SegmentInfo.files() to compute that.

  • Changed IndexReader/IndexWriter to not directly delete files and
    instead notify IndexFileDeleter when there has been a change to
    the in-memory SegmentInfos. The deleter then incref/decref's to
    determine what files can safely be deleted.

This is also a nice simplification for the same reason as above:
now the writers just make changes to SegmentInfo(s) without having
to compute/track the consequences to specific index files.

  • Simplified the fix for LUCENE-702 (addIndexes corrupts index on
    disk full) to just temporarily set autoCommit=false if it's not
    already.
  • Added get/setDefaultInfoStream to IndexWriter so you could see
    things that happen during IndexWriter constructor.
  • No longer store/propogate persistent IndexFileDelter inside
    IndexReader (removed protected method get/setDeleter). This is a
    nice simplification because the deleter is now only needed briefly
    during "commit()".
  • Reworked the toplevel javadoc for IndexWriter.
  • Added try/finally to remove a partially written segments_N if we
    hit IOException when trying to write it.
  • Other small change (small refactoring, fixes to javadocs, fixed
    spelling, etc).

Michael McCandless added a comment - 07/Mar/07 09:56 AM
Rebased the patch to the current trunk. I plan to commit this probably end of this week.

Doron Cohen added a comment - 09/Mar/07 06:26 AM
Mike, patching take2 on current trunk fails for IndexFileDeleter.java.
patching file src/java/org/apache/lucene/index/IndexFileDeleter.java
Hunk #1 FAILED at 18.
Also some noise in SegmentInfo.java
patching file src/java/org/apache/lucene/index/SegmentInfo.java
Hunk #7 succeeded at 291 (offset 3 lines).

Michael McCandless added a comment - 09/Mar/07 09:44 AM
Woops, looks like the commit for LUCENE-825 messed up the patch. OK I updated and re-diff'd and attached take3.

It's too bad we don't have a patch that's better integrated with svn such that if even you have a more recent svn revision checked out, applying the patch would do so back against the revision it was based on, and then svn would merge the changes committed to the trunk since then. In this case an svn update on the checkout with the diffs produced no conflicts, so if we had such a combined patch tool, it would have worked find here. I suppose the person applying the patch could first "svn update" to its base revision, apply the patch, then svn up, but that's kind of a hassle


Doron Cohen added a comment - 15/Mar/07 10:08 AM
I was too slow in reviewing this, so while I was studying the new code it was committed...

Anyhow I have a few comments and a question - I think JIRA LUCENE-710 is still the place for this discussion even though the issue is already resolved.

The attached 710.comments.diff implements a few suggested changes.

I like the definition and use of IndexDeletePolicy and CommitPoint - this is very flexible and clear, and would indeed allow to implement NFS suited logic. These two concepts are central to implementing such logic, and I thought their Javadocs should be enhanced (included in the attached).

IndexFileDeleter - it is nice that this became non public and somewhat simpler. I added some internal documentation (not javadocs) in that file as I learned how it works. I think these would be useful for others diving into this code. I also modified some variable names for clarity (in the attached).

I don't understand yet why we allow a deletion policy to delete all commits (including the most recent) - TestDeletionPolicy explains this as: "This is useful for adding to a big index w/ autoCommit =false when you know readers are not using it." - so, would I risk losing the big index should uncommited additions fail? what does one earn by this? I first thought we should prevent (exception) deleting the most recent commit, but I must be missing something - could you elaborate on this?

checkpoints() is another - more internal - new concept in this code. At writing this I don't fully understand it. IndexWriter has its own checkpoint() method, but it also calls IndexFileDeleter.checkpoint(). IndexReader only calls IndexFileDeletion.checkpoint() - it does not have a checkpoint() itself. ...mmm... For IndexReader it makes sense since it always commits only at close(), or at explicit calls to commit(). Perhaps I understand it better now... Ok, I added some documentation for this in IndexWriter, I think it would also help others. (in the attached.)

This issue also introduced constants for file names - hasSingleNorms (i.e. nrm) and SINGLE_NORMS_EXTENSION (.fN) were confusing/collating - so I modified .fN to PLAIN_NORMS_EXTENSION.

This issue moved some files logic SegmentInfo. The -1/1/0 logic and especially with norms is confusing, and at least I have to re-read the code carefully each time again and again to be convinced that it is correct. It would be nice when we can get rid of some of the backward compatibility cases here. Anyhow I added some documentation and also replaced the -1/1/0 with constants, I think this makes it easier to understand.

Regards,
Doron


Michael McCandless added a comment - 15/Mar/07 12:14 PM
Thanks for the review Doron!

Your added comments & improvements to the variable names are
excellent. I especially like the new constants (YES, NO, CHECK_DIR,
etc.) in SegmentInfo. I've tweaked your patch here and there, and
attached a modified patch (710.review.take2.diff). If you're happy
with that then go ahead and commit it?

> IndexFileDeleter - it is nice that this became non public and somewhat
> simpler.

I especially like that this class now has very little knowledge of
what files "belong" to an index, especially compared to before. That
knowledge has now been consolidated under SegmentInfo which I think is
the right place for it.

> I don't understand yet why we allow a deletion policy to delete
> all commits (including the most recent) - TestDeletionPolicy
> explains this as: "This is useful for adding to a big index w/
> autoCommit =false when you know readers are not using it." - so,
> would I risk losing the big index should uncommited additions fail?
> what does one earn by this? I first thought we should prevent
> (exception) deleting the most recent commit, but I must be missing
> something - could you elaborate on this?

The use case I was thinking of is: say you already have a large index
and then you need to add a bunch more docs to it. If you are not
allowed to delete the starting commit, then, you will consume
substantially more disk space as you are building your index because
the large segments at the start can't be removed. This would have
made the "autoCommit false" case unnecessarily worse than the
"autoCommit true" case. If for a given application the developer is
concerned about safety (losing index due to crash), then the normal
default policy should be used.

> This issue moved some files logic SegmentInfo. The -1/1/0 logic and
> especially with norms is confusing, and at least I have to re-read
> the code carefully each time again and again to be convinced that it
> is correct. It would be nice when we can get rid of some of the
> backward compatibility cases here. Anyhow I added some documentation
> and also replaced the -1/1/0 with constants, I think this makes it
> easier to understand.

Yes the backwards compatibility code (for pre-2.1 indices) is complex.
The good news is by the time we release this in 2.2, most indices that
upgrade to 2.2 will be 2.1.


Doron Cohen added a comment - 15/Mar/07 07:30 PM
> If for a given application the developer is concerned
> about safety (losing index due to crash), then the
> normal default policy should be used.

Spooky... what if onCommit() also deletes all commits?
(Might this be a pit for users to fall in...?)

I added warnings about this in IndexDeletionPolicy methods.

Just commiited review.take2 + these comments.


Michael McCandless added a comment - 15/Mar/07 10:31 PM
> I added warnings about this in IndexDeletionPolicy methods.

I think that's good. Thanks!