Lucene - Core
  1. Lucene - Core
  2. LUCENE-1451

Can't create NIOFSDirectory w/o setting a system property

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      NIOFSDirectory.getDirectory() returns a FSDirectory object

      1. LUCENE-1451.patch
        5 kB
        Yonik Seeley
      2. LUCENE-1451.patch
        7 kB
        Yonik Seeley
      3. LUCENE-1451.patch
        6 kB
        Yonik Seeley

        Activity

        Hide
        Doug Cutting added a comment -

        A bit of history, if any care.

        Originally Lucene synchronized only on the Directory to coordinate index writes. This was the primary motivation for the Directory cache.

        http://tinyurl.com/62ugs9#l120

        Later, lock files were added, but Directory synchronization remained as an optimization, to avoid hitting the filesystem when coordinating threads in the same process.

        http://tinyurl.com/5dzsyq#l125

        Now Directory-based synchronization has been dropped entirely, so the Directory cache can now go too.

        Show
        Doug Cutting added a comment - A bit of history, if any care. Originally Lucene synchronized only on the Directory to coordinate index writes. This was the primary motivation for the Directory cache. http://tinyurl.com/62ugs9#l120 Later, lock files were added, but Directory synchronization remained as an optimization, to avoid hitting the filesystem when coordinating threads in the same process. http://tinyurl.com/5dzsyq#l125 Now Directory-based synchronization has been dropped entirely, so the Directory cache can now go too.
        Hide
        Yonik Seeley added a comment -

        Committed. Thanks for the input guys!

        Show
        Yonik Seeley added a comment - Committed. Thanks for the input guys!
        Hide
        Michael McCandless added a comment -

        I think File is fine? But I don't have a strong preference (I can't think of any tradeoffs).

        Show
        Michael McCandless added a comment - I think File is fine? But I don't have a strong preference (I can't think of any tradeoffs).
        Hide
        Yonik Seeley added a comment -

        Any preferences on the public constructor signature... File or String for the path?

        Show
        Yonik Seeley added a comment - Any preferences on the public constructor signature... File or String for the path?
        Hide
        Yonik Seeley added a comment -

        When I was reviewing getDirectory() I can across a race condition anyway - could lead to handing out a closed directory. But people would need to be opening and closing directories rapidly from multiple threads, so I'm not sure if it's actually bitten anyone.

        Show
        Yonik Seeley added a comment - When I was reviewing getDirectory() I can across a race condition anyway - could lead to handing out a closed directory. But people would need to be opening and closing directories rapidly from multiple threads, so I'm not sure if it's actually bitten anyone.
        Hide
        Michael McCandless added a comment -

        True. And in general I don't like have more than one way to do basically the same thing. People can always do the single-Directory-instance-per-File thing externally if they need to.

        Show
        Michael McCandless added a comment - True. And in general I don't like have more than one way to do basically the same thing. People can always do the single-Directory-instance-per-File thing externally if they need to.
        Hide
        Yonik Seeley added a comment -

        Should we deprecate FSDirectory.getDirectory?

        Perhaps... it would cut down the number of silly future errors in the form of NIOFSDirectory.getDirectory().

        Show
        Yonik Seeley added a comment - Should we deprecate FSDirectory.getDirectory? Perhaps... it would cut down the number of silly future errors in the form of NIOFSDirectory.getDirectory().
        Hide
        Michael McCandless added a comment -

        This looks good, Yonik. Should we deprecate FSDirectory.getDirectory?

        Show
        Michael McCandless added a comment - This looks good, Yonik. Should we deprecate FSDirectory.getDirectory?
        Hide
        Yonik Seeley added a comment -

        Ok, here's a version that adds public constructors, and a test that ensures that reading, writing, and locking all work concurrently in the same directory with different FSDirectory implementations.

        AFAIK, there's nowhere in Lucene that synchronizes on the Directory instance (where it matters), so multiple instances per path should be safe.

        Show
        Yonik Seeley added a comment - Ok, here's a version that adds public constructors, and a test that ensures that reading, writing, and locking all work concurrently in the same directory with different FSDirectory implementations. AFAIK, there's nowhere in Lucene that synchronizes on the Directory instance (where it matters), so multiple instances per path should be safe.
        Hide
        Michael McCandless added a comment -

        I think we should simply allow direct instantiation of NIOFSDirectory, and not map to a singular instance per-File. But we should keep it for FSDir for back-compat.

        Show
        Michael McCandless added a comment - I think we should simply allow direct instantiation of NIOFSDirectory, and not map to a singular instance per-File. But we should keep it for FSDir for back-compat.
        Hide
        Hoss Man added a comment -

        Right... the tests in the patch test for this too. If an instance already exists, then it's returned.

        Miller is probably right and we should just remove the cache anyway – but if we do leave it in this still seems dangerous to me, clients aren't guaranteed to get what they ask for.

        Adopting the same pattern as with LockType (throw an Exception if a Directory is in the cache but doens't match the class specified) seems safer.

        Show
        Hoss Man added a comment - Right... the tests in the patch test for this too. If an instance already exists, then it's returned. Miller is probably right and we should just remove the cache anyway – but if we do leave it in this still seems dangerous to me, clients aren't guaranteed to get what they ask for. Adopting the same pattern as with LockType (throw an Exception if a Directory is in the cache but doens't match the class specified) seems safer.
        Hide
        Yonik Seeley added a comment -

        if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls.

        Right... the tests in the patch test for this too. If an instance already exists, then it's returned. If all the instances are closed, the desired implementation can be returned. So it won't be for all future calls, but it would be difficult to ensure you got the implementation you wanted.

        Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed?

        I had assumed it was locking.... but it goes outside of the FSDirectory and operates directly on File. So, I'm not sure.

        Show
        Yonik Seeley added a comment - if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls. Right... the tests in the patch test for this too. If an instance already exists, then it's returned. If all the instances are closed, the desired implementation can be returned. So it won't be for all future calls, but it would be difficult to ensure you got the implementation you wanted. Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed? I had assumed it was locking.... but it goes outside of the FSDirectory and operates directly on File. So, I'm not sure.
        Hide
        Mark Miller added a comment -

        Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed?

        While it might be useful sometimes, I don't see how it really is needed. I think a better solution is allowing for direct instantiation while providing a factory class if for some reason you need that behavior. Most apps I have worked with do not need it and I don't understand why its forced on you, other than a bit of overprotection.

        Show
        Mark Miller added a comment - Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed? While it might be useful sometimes, I don't see how it really is needed. I think a better solution is allowing for direct instantiation while providing a factory class if for some reason you need that behavior. Most apps I have worked with do not need it and I don't understand why its forced on you, other than a bit of overprotection.
        Hide
        Michael McCandless added a comment -

        Hmmm, external subclasses should be able to call the protected getDirectory method, right?

        Woops sorry you're right.

        Show
        Michael McCandless added a comment - Hmmm, external subclasses should be able to call the protected getDirectory method, right? Woops sorry you're right.
        Hide
        Michael McCandless added a comment -

        1. if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls.

        Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed?

        Show
        Michael McCandless added a comment - 1. if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls. Hmm good point. Actually why do we do this 'single instance of FSDir per File' again? Is it really needed?
        Hide
        Yonik Seeley added a comment -

        Attaching patch with tests.

        Show
        Yonik Seeley added a comment - Attaching patch with tests.
        Hide
        Hoss Man added a comment -

        It seems like we should make it possible to create multiple FSDirectory implementations of different types in the same JVM.

        As mentioned in the thread that spawned this issue...

        1. i'm not sure i see any real motivation for this. the biggest motivation to use an alternate impl will be OS (which is fixed for the whole JVM)
        1. if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls.

        that second problem can be solved in a similar way to the lockFactory check ... but like i said: this seems like a can of worms, with very little real advantage over a simple static setter.

        Show
        Hoss Man added a comment - It seems like we should make it possible to create multiple FSDirectory implementations of different types in the same JVM. As mentioned in the thread that spawned this issue... 1. i'm not sure i see any real motivation for this. the biggest motivation to use an alternate impl will be OS (which is fixed for the whole JVM) 1. if i'm reading it right, this patch won't work if two different callers ask for Directories backed by the same File using different directoryImpl instances ... the first one will go into the DIRECTORIES cache and "win" for all future calls. that second problem can be solved in a similar way to the lockFactory check ... but like i said: this seems like a can of worms, with very little real advantage over a simple static setter.
        Hide
        Yonik Seeley added a comment -

        Hmmm, external subclasses should be able to call the protected getDirectory method, right?
        But perhaps a test case for external subclasses is warranted.

        Show
        Yonik Seeley added a comment - Hmmm, external subclasses should be able to call the protected getDirectory method, right? But perhaps a test case for external subclasses is warranted.
        Hide
        Michael McCandless added a comment -

        That looks like a good approach to me.

        Though, should we make the new getDirectory that takes the Class directoryImpl arg public, so that external subclasses of FSDirectory could also create their own instances?

        Show
        Michael McCandless added a comment - That looks like a good approach to me. Though, should we make the new getDirectory that takes the Class directoryImpl arg public, so that external subclasses of FSDirectory could also create their own instances?
        Hide
        Yonik Seeley added a comment -

        Attaching untested patch implementing logic above.

        Show
        Yonik Seeley added a comment - Attaching untested patch implementing logic above.
        Hide
        Yonik Seeley added a comment -

        The actual class used for making new instances of Directory is FSDirectory.IMPL, set via the system property "org.apache.lucene.FSDirectory.class".

        It seems like we should make it possible to create multiple FSDirectory implementations of different types in the same JVM.
        The simplest workaround would be to create a
        protected static FSDirectory getDirectory(File file, LockFactory lockFactory, Class impl)
        Subclasses could implement their own getDirectory calls that pass down the appropriate class.

        Not the most elegant solution if one were starting from scratch, but it seems like it would be completely backward compatible, while allowing NIOFSDirectory.getDirectory() to do what it looks like it should do.

        Show
        Yonik Seeley added a comment - The actual class used for making new instances of Directory is FSDirectory.IMPL, set via the system property "org.apache.lucene.FSDirectory.class". It seems like we should make it possible to create multiple FSDirectory implementations of different types in the same JVM. The simplest workaround would be to create a protected static FSDirectory getDirectory(File file, LockFactory lockFactory, Class impl) Subclasses could implement their own getDirectory calls that pass down the appropriate class. Not the most elegant solution if one were starting from scratch, but it seems like it would be completely backward compatible, while allowing NIOFSDirectory.getDirectory() to do what it looks like it should do.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development