Issue Details (XML | Word | Printable)

Key: LUCENE-635
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Yonik Seeley
Reporter: Michael McCandless
Votes: 0
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

[PATCH] Decouple locking implementation from Directory implementation

Created: 27/Jul/06 12:31 AM   Updated: 30/Aug/06 07:34 PM
Return to search
Component/s: Index
Affects Version/s: 2.0.0
Fix Version/s: 2.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works LUCENE-635-Aug27.patch 2006-08-27 11:45 AM Michael McCandless 43 kB
Text File Licensed for inclusion in ASF works LUCENE-635-Aug3.patch 2006-08-03 11:09 AM Michael McCandless 43 kB
File Licensed for inclusion in ASF works patch-Jul26.tar 2006-07-27 12:34 AM Michael McCandless 90 kB
Issue Links:
Reference
 

Resolution Date: 29/Aug/06 01:12 AM


 Description  « Hide
This is a spinoff of http://issues.apache.org/jira/browse/LUCENE-305.

I've opened this new issue to capture that it's wider scope than
LUCENE-305.

This is a patch originally created by Jeff Patterson (see above link)
and then modified as described here:

http://issues.apache.org/jira/browse/LUCENE-305#action_12418493

with some small additional changes:

  • For each FSDirectory.getDirectory(), I made a corresponding
    version that also accepts a LockFactory instance. So, you can
    construct an FSDirectory with your own LockFactory.
  • Cascaded defaulting for FSDirectory's LockFactory implementation:
    if you pass in a LockFactory instance, it's used; else if
    setDisableLocks was called, we use NoLockFactory; else, if the
    system property "org.apache.lucene.store.FSDirectoryLockFactoryClass"
    is defined, we use that; finally, we'll use the original locking
    implementation (SimpleFSLockFactory).

The gist is that all locking code has been moved out of *Directory and
into subclasses of a new abstract LockFactory class. You can now set
the LockFactory of a Directory to change how it does locking. For
example, you can create an FSDirectory but set its locking to
SingleInstanceLockFactory (if you know all writing/reading will take
place a single JVM).

The changes pass all unit tests (on Ubuntu Linux Sun Java 1.5 and
Windows XP Sun Java 1.4), and I added another TestCase to test the
LockFactory code.

Note that LockFactory defaults are not changed: FSDirectory defaults
to SimpleFSLockFactory and RAMDirectory defaults to
SingleInstanceLockFactory.

Next step (separate issue) is to create a LockFactory that uses the OS
native locks (through java.nio).



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Michael McCandless added a comment - 27/Jul/06 12:34 AM
TAR file containing sources as first cut at implementation. I've also included patch files off revision 425918.

Michael McCandless added a comment - 03/Aug/06 11:09 AM
This patch contains the same source changes as my July 26 patch, but this one is done "correctly" as the output of a single top-level "svn diff" command (ie, I ran "svn add ..." locall for the new files). I also added an entry to CHANGES.txt, and corrected newlines on one of the sources.

Michael McCandless added a comment - 14/Aug/06 10:53 AM

Has anyone had a chance to look at this patch?

This should be fully backwards compatible: old APIs have not changed.
I've just added new ones that allow you to set the locking
implementation per Directory. The default Locking implementation also
has not changed; it's just been refactored out of the *Directory.java
sources. So this should be a drop-in change to existing users of
Lucene.

This change passes all unit tests, and I added a new test (with 9 test
cases) for LockFactory.

The above LUCENE-635-Aug3.patch still applies cleanly to the
current svn HEAD (431322).


Otis Gospodnetic added a comment - 14/Aug/06 02:43 PM
I took a look at it a few weeks back. If nobody takes care of it, I'll look at it again and hopefully commit it after I return from vacation in September.

Michael McCandless added a comment - 14/Aug/06 03:08 PM
Awesome, thanks Otis! Have a great vacation!

Yonik Seeley added a comment - 17/Aug/06 09:16 PM
Very nice job Michael... very thorough.
In general, locking & synchronization is something that requires hard review since it's hard to test for correctness, but the thouroughness of your tests increases my confidence.

Super-minor improvement while I'm looking at it: could the following
method body be replaced with "synchronized(locks) {return locks.add(lockName);}" ?

+ public boolean obtain() throws IOException {
+ synchronized(locks) {
+ if (!locks.contains(lockName)) { + locks.add(lockName); + return true; + } else { + return false; + }
+ }
+ }

As far as backward compatibility, could you speak to
1) FSDirectory.disableLocks
2) Slight changes in how FSLock.obtain works (the old one recreated the lock dir each time)


Michael McCandless added a comment - 17/Aug/06 10:55 PM
Thank you! I agree, locking is sneaky and requires very thorough
review & testing.

Nice, I definitely like that more compact version of
SingleInstanceLockFactory.obtain – I'll fold that in.

On FSDirectory.disableLocks, which is a private static boolean set by
"setDisabledLocks", if this is "true" when the FSDirectory is created
then FSDirectory uses the NoLockFactory for its locking; else it uses
the default SimpleFSLockFactory. (This is only when the caller did
not provide a LockFactory instance).

OOH I do see one difference: in the current code, if you call
setDisableLocks then this affects even a previously created
FSDirectory, with the current code. But with my changes, only newly
created FSDirectory instances will have locking disabled. Ie, it's no
longer "retroactive" to all previously created FSDirectory instances,
with my change. Hmm. OK I will fix this case.

On SimpleFSLock.obtain, you are correct: I lost the creation of the
lock dir (if it doesn't exist) with each obtain. Good catch! I
didn't mean to lose it. I will put it back in, and move it out of the
init() method in SimpleFSLockFactory.

Thanks for reviewing this!


Yonik Seeley added a comment - 17/Aug/06 11:23 PM
Yeah... those were the slight differences in external behavior I saw.
That doesn't mean it's wrong, but it does mean we should examine if it's OK to change it (or just defer the changes to a later patch...).

Michael McCandless added a comment - 18/Aug/06 09:47 PM

OK, does anyone have a strong opinion one way or another on these
small changes?

I would lean towards keeping the small change to "setDisabledLocks()".
Meaning, it's only when you create a FSDirectory that the static
"disableLocks" value is checked. So, changing disabledLocks would no
longer retroactively affect all previously created FSDirectories,
which seems too "powerful" – what if I wanted some to be disabled and
others not? Was it intentional that it was this powerful? If we do
this we could document it in CHANGES.txt as a small difference. Or,
again, I can put back the old behaviour if people think that's best.

On the second one, I agree we should keep the current behaviour of
checking existence of & creating the LOCK DIR with each obtain. There
would be some performance benefit to only doing it on creating the
lock factory, but, I don't think that's worth the risk of the change.
So I'll go ahead & fix that one.


Yonik Seeley added a comment - 26/Aug/06 02:51 PM
> I would lean towards keeping the small change to "setDisabledLocks()".
> Meaning, it's only when you create a FSDirectory that the static
> "disableLocks" value is checked.

I think this is probably OK. In addition to being a little-used method, If one truely wanted locking disabled (for read-only media for example) they would be calling setDisableLocks() before opening an IndexReader anyway.


Michael McCandless added a comment - 27/Aug/06 11:45 AM
OK, I agree. I've updated the CHANGES.txt to state this small change.

And I've fixed SimpleFSLockFactory to move directory existence checking & creation back into the obtain() method.

New patch attached!


Yonik Seeley added a comment - 29/Aug/06 01:12 AM
Committed. Thanks Michael!

Doron Cohen added a comment - 29/Aug/06 07:54 PM
While updating my patch for 665 according the changes here, I noticed something - I may be wrong here - but it seems to me that until this change, all the actual FS access operations where performed by FSDirectory, using the Directory API.

The new SimpleFSLock and SimpleFSLockFactory also access the FS directly, not through FSDirectory API.

That Directory abstraction in Lucene allows to develop Lucene-in-RAM, Lucene-in-DB, etc. It is a nice feature.

Guess we can say: "well, now the abstraction is made of two interfaces - Lock and Directory, just make sure you use 'matching' implementations of them." This seems weaker than before.

Or, can limit all file access to go through FSDirectory -

  • one possibility is to add to LockFactory a Directory object (as a class member); SimpleFSLockFactory can require thas Directory object to be FSDirectory (cast, and fail otherwise); also, FSDirectory should be extened with createSingleFile(), mkdirs() and isDirectory().

Michael McCandless added a comment - 30/Aug/06 11:19 AM
With this change, "Directory on DB", "Directory on RAM", etc., still
work correctly. In fact you can completely override the LockFactory
behavior by implementing your own "makeLock" in a subclass of
Directory if you want to.

This change just opens up the freedom to allow you to separately
choose how your locking is done. I think this is important because
many applications have different locking requirements. Perhaps you
require no locking at all (NoLockFactory or legacy
FSDirectory.setDisabledLocks), or everything happens in one JVM
(SingleInstanceLockFactory), etc.

This also opens up the chance for people to work around locking issues
eg over NFS until we can get lock-less commits finished.

I'm working on a LockFactory implementation that uses native OS locks
(java.nio.*) and this will be another place that accesses the file
system. The java.io.File.createNewFile (used by the
SimpleFSLockFactory) has a very spooky warning about not using it for
locking.

We could (as you're suggesting) indeed extend FSDirectory so that it
provided the low level methods required by a locking implementation,
and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new
LockFactory) so that all underlying IO is through the FSDirectory
instead.


Doron Cohen added a comment - 30/Aug/06 07:34 PM
> We could (as you're suggesting) indeed extend FSDirectory so that it
> provided the low level methods required by a locking implementation,
> and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new
> LockFactory) so that all underlying IO is through the FSDirectory instead.

Yes, this is exactly (and only) what I am suggesting to consider - to include a Directory member within the LockFactory so that it is clear that any LockFactory implementation operates in the realm of a directory (implementation) and is using it for any actual store accesses.