Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      See LUCENE-6507 for some background. In general it would be great if you can just acquire an immutable lock (or you get a failure) and then you close that to release it.

      Today the API might be too much for what is needed by IW.

      1. LUCENE-6508.patch
        145 kB
        Robert Muir
      2. LUCENE-6508.patch
        130 kB
        Robert Muir
      3. LUCENE-6508.patch
        52 kB
        Uwe Schindler
      4. LUCENE-6508.patch
        46 kB
        Robert Muir
      5. LUCENE-6508-deadcode1.patch
        2 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Thanks Robert for opening that issue. This was my initial idea about Simon's issue. And I was already looking into fixing that.

        I will take care of this issue after or while buzzwords. Then I can discuss with Simon Willnauer in person about sleeping horses and locks

        Show
        Uwe Schindler added a comment - Thanks Robert for opening that issue. This was my initial idea about Simon's issue. And I was already looking into fixing that. I will take care of this issue after or while buzzwords. Then I can discuss with Simon Willnauer in person about sleeping horses and locks
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Robert Muir added a comment -

        Random ideas to make this better:

        • remove timeouts
        • remove Lock.isLocked(), Lock.obtain(), IndexWriter.isLocked(Dir), etc
        • just have Directory.obtain() which either succeeds and gives you Closeable or gives IOException
        • obtain() should return an immutable thing, that will simplify a lot here.
        • maybe Directory should "know" of the lock and check on each createOutput, delete, rename, etc. This would give more safety.
        • maybe add method Lock.isValid(). For network filesystems, things like disconnected nodes can cause locks to be "lost". look into things like FileLock.isValid and see if they are useful. (SimpleFS can implement with Files.exists).
        Show
        Robert Muir added a comment - Random ideas to make this better: remove timeouts remove Lock.isLocked(), Lock.obtain(), IndexWriter.isLocked(Dir), etc just have Directory.obtain() which either succeeds and gives you Closeable or gives IOException obtain() should return an immutable thing, that will simplify a lot here. maybe Directory should "know" of the lock and check on each createOutput, delete, rename, etc. This would give more safety. maybe add method Lock.isValid(). For network filesystems, things like disconnected nodes can cause locks to be "lost". look into things like FileLock.isValid and see if they are useful. (SimpleFS can implement with Files.exists).
        Hide
        Michael McCandless added a comment -

        +1 to this plan

        Show
        Michael McCandless added a comment - +1 to this plan
        Hide
        Uwe Schindler added a comment -

        In LockFactory we need the following: BasDirectory.makeLock() currently delegates directly to the LockFactory (its a final method). So we should rename this method in LockFactory, too and make it return a lock only after aquire. Therefore, the LockFactory would do what Robert proposed. Otherwise I like the proposal. I will work the next days on it (I already started to rename some stuff).

        The lock cannot be completely immutable. Because the Closeable interface should still be implemented correctly: close() must be idempotent, so we still need the state. But it is immutable in that sense that you cannot re-obtain the lock.

        Show
        Uwe Schindler added a comment - In LockFactory we need the following: BasDirectory.makeLock() currently delegates directly to the LockFactory (its a final method). So we should rename this method in LockFactory, too and make it return a lock only after aquire. Therefore, the LockFactory would do what Robert proposed. Otherwise I like the proposal. I will work the next days on it (I already started to rename some stuff). The lock cannot be completely immutable. Because the Closeable interface should still be implemented correctly: close() must be idempotent, so we still need the state. But it is immutable in that sense that you cannot re-obtain the lock.
        Hide
        Uwe Schindler added a comment -

        In any case, we should not hurry this. We should iterate the API several times. I hope more people look into this this time. Last year when I refactored this for the first time, the interest was quite low.

        Show
        Uwe Schindler added a comment - In any case, we should not hurry this. We should iterate the API several times. I hope more people look into this this time. Last year when I refactored this for the first time, the interest was quite low.
        Hide
        Robert Muir added a comment -

        There is no hurry. I will prototype some stuff and see where I get.

        This is exactly why i moved this out of LUCENE-6507, so we can take our time and make this work better in the future. And that is separate from fixing completely broken bugs like LUCENE-6507 which are really release blockers

        Show
        Robert Muir added a comment - There is no hurry. I will prototype some stuff and see where I get. This is exactly why i moved this out of LUCENE-6507 , so we can take our time and make this work better in the future. And that is separate from fixing completely broken bugs like LUCENE-6507 which are really release blockers
        Hide
        Uwe Schindler added a comment -

        I may have been to aggressive, sorry for that. I am fine with the given proposals, my intent is just to have it better this time - it was a large step forward last September to remove the ability to have LockFactory and Directory work on different directories at all. I already removed lots of outdated APIs like forcefully unlocking.

        Also in this issue, I still want to keep the LockFactory and Directory separation alive, because it allows to configure this much better. But I know we agree on this

        I will help on that issue, maybe we should open a branch? I knew from last time that this was a horrible amount of code to touch...

        Show
        Uwe Schindler added a comment - I may have been to aggressive, sorry for that. I am fine with the given proposals, my intent is just to have it better this time - it was a large step forward last September to remove the ability to have LockFactory and Directory work on different directories at all. I already removed lots of outdated APIs like forcefully unlocking. Also in this issue, I still want to keep the LockFactory and Directory separation alive, because it allows to configure this much better. But I know we agree on this I will help on that issue, maybe we should open a branch? I knew from last time that this was a horrible amount of code to touch...
        Hide
        Robert Muir added a comment -

        Agreed, I will make a branch, it will take time. I still want to improve tests around this as well.

        Some new tests from LUCENE-6507 are better, but to me its still not ideal. I think we should move to a kind of BaseLFTestCase like our other important classes and each impl has a subclass with additional tests for its own pecularities.

        Show
        Robert Muir added a comment - Agreed, I will make a branch, it will take time. I still want to improve tests around this as well. Some new tests from LUCENE-6507 are better, but to me its still not ideal. I think we should move to a kind of BaseLFTestCase like our other important classes and each impl has a subclass with additional tests for its own pecularities.
        Hide
        Uwe Schindler added a comment -

        Just for record, so we don't forget:

        • the inner class oal.store.Lock.With is obsolete, it is no longer used anywhere. Can be removed ASAP. It was just there before the famous Java 7 try-with-resources was added. As Lock is closeable now, you can do:
        try (Lock lock = directory.obtainLock(name)) {
          // do stuff
        }
        

        So the class is obsolete.

        Show
        Uwe Schindler added a comment - Just for record, so we don't forget: the inner class oal.store.Lock.With is obsolete, it is no longer used anywhere. Can be removed ASAP. It was just there before the famous Java 7 try-with-resources was added. As Lock is closeable now, you can do: try (Lock lock = directory.obtainLock(name)) { // do stuff } So the class is obsolete.
        Hide
        Uwe Schindler added a comment -

        First patch removing this dead code.

        Show
        Uwe Schindler added a comment - First patch removing this dead code.
        Hide
        Robert Muir added a comment -

        Hi Uwe, I propose something like the following.

        I like the example usage actually and think we should keep it.

        I think javadocs need improvement on the semantics of close(), i dont think we should use specialized exceptions but just deliver any exact exception that happens.

        I also want to set things up for impls to improve safety (best-effort) in some cases. For example its dangerous for SimpleFSLockFactory to delete the lock file, if someone removed the file and something else has it. It just contributes to chaos, when a file creation time check could easily detect it.

        /** An interprocess mutex lock.
         * <p>Typical use might look like:<pre class="prettyprint">
         *   try (Lock lock = directory.obtainLock("my.lock")) {
         *     public Object doBody() {
         *       <i>... code to execute while locked ...</i>
         *     }
         *   }.run();
         * </pre>
         *
         * @see Directory#obtainLock(String)
         *
         * @lucene.internal
         */
        public abstract class Lock implements Closeable {
        
          /** 
           * Releases exclusive access.
           * <p>
           * Note that exceptions thrown from close may require
           * human intervention, as it may mean the lock was no
           * longer valid, or that fs permissions prevent removal
           * of the lock file, or other reasons.
           * <p>
           * {@inheritDoc} 
           */
          public abstract void close() throws IOException;
          
          /** 
           * Best effort check that this lock is still valid. Locks
           * could become invalidated externally for a number of reasons,
           * for example if a user deletes the lock file manually or
           * when a network filesystem is in use. 
           * @throws IOException if the lock is no longer valid.
           */
          public abstract void ensureValid() throws IOException;
        }
        
        Show
        Robert Muir added a comment - Hi Uwe, I propose something like the following. I like the example usage actually and think we should keep it. I think javadocs need improvement on the semantics of close(), i dont think we should use specialized exceptions but just deliver any exact exception that happens. I also want to set things up for impls to improve safety (best-effort) in some cases. For example its dangerous for SimpleFSLockFactory to delete the lock file, if someone removed the file and something else has it. It just contributes to chaos, when a file creation time check could easily detect it. /** An interprocess mutex lock. * <p>Typical use might look like:<pre class= "prettyprint" > * try (Lock lock = directory.obtainLock( "my.lock" )) { * public Object doBody() { * <i>... code to execute while locked ...</i> * } * }.run(); * </pre> * * @see Directory#obtainLock( String ) * * @lucene.internal */ public abstract class Lock implements Closeable { /** * Releases exclusive access. * <p> * Note that exceptions thrown from close may require * human intervention, as it may mean the lock was no * longer valid, or that fs permissions prevent removal * of the lock file, or other reasons. * <p> * {@inheritDoc} */ public abstract void close() throws IOException; /** * Best effort check that this lock is still valid. Locks * could become invalidated externally for a number of reasons, * for example if a user deletes the lock file manually or * when a network filesystem is in use. * @ throws IOException if the lock is no longer valid. */ public abstract void ensureValid() throws IOException; }
        Hide
        Robert Muir added a comment -

        Here is a prototype patch (does not compile, just for looking).

        Implements Lock as described above. Directory.obtainLock looks like this:

          /** 
           * Returns an obtained {@link Lock}.
           * @param name the name of the lock file
           * @throws IOException if the lock could not be obtained
           */
          public abstract Lock obtainLock(String name) throws IOException;
        

        IndexWriter wraps the directory with a simple DirectoryWrapper, that calls isValid() on the write.lock before any major destructive operation (e.g. createOutput, delete, rename) for best-effort safety.

        SimpleFS isValid() is mainly implemented by a ctime check. This LockFactory is interesting because its close() method could potentially release someone else's lock if stuff has gone wrong. It calls isValid() as best effort detection and informs the user if the lock file cannot be safely deleted and doesn't try to delete it. It also informs when things seem ok, but it is unable to delete the file for some reason.

        NativeFS checks all necessary components in isValid(): map entry, lock, filechannel, os file descriptor, and underlying file ctime. It doesn't need the careful logic of SimpleFS in close, because nothing external can cause it to release someone else's lock.

        In both cases i tried to improve exception messages and toStrings.

        obtain timeout, dangerous loops and discarding exceptions are all removed. There were two justifications for this:

        • claimed sporatic permission denied on mac os X for NativeFSLock
        • claimed sporatic access denied on windows for SimpleFSLock

        Either these were bugs in our previous locking, or real bugs/crazy behavior in the JDK or OS. If its the former, we dont need the timeout ever. If its the latter, lets get specific exceptions and cases and try to dig into what is wrong. Worst case, we should make Windows/OS X versions of LockFactory for any wierd stuff like that, and let them take timeout as their own parameter (user must deal with it). But this should be a LockFactory impl detail. Maybe someone even wants to make a LockFactory that blocks for some crazy reason (uses fc.lock instead of .tryLock)

        I didnt try cutting over tests yet or other things, that is more work.

        Show
        Robert Muir added a comment - Here is a prototype patch (does not compile, just for looking). Implements Lock as described above. Directory.obtainLock looks like this: /** * Returns an obtained {@link Lock}. * @param name the name of the lock file * @ throws IOException if the lock could not be obtained */ public abstract Lock obtainLock( String name) throws IOException; IndexWriter wraps the directory with a simple DirectoryWrapper, that calls isValid() on the write.lock before any major destructive operation (e.g. createOutput, delete, rename) for best-effort safety. SimpleFS isValid() is mainly implemented by a ctime check. This LockFactory is interesting because its close() method could potentially release someone else's lock if stuff has gone wrong. It calls isValid() as best effort detection and informs the user if the lock file cannot be safely deleted and doesn't try to delete it. It also informs when things seem ok, but it is unable to delete the file for some reason. NativeFS checks all necessary components in isValid(): map entry, lock, filechannel, os file descriptor, and underlying file ctime. It doesn't need the careful logic of SimpleFS in close, because nothing external can cause it to release someone else's lock. In both cases i tried to improve exception messages and toStrings. obtain timeout, dangerous loops and discarding exceptions are all removed. There were two justifications for this: claimed sporatic permission denied on mac os X for NativeFSLock claimed sporatic access denied on windows for SimpleFSLock Either these were bugs in our previous locking, or real bugs/crazy behavior in the JDK or OS. If its the former, we dont need the timeout ever. If its the latter, lets get specific exceptions and cases and try to dig into what is wrong. Worst case, we should make Windows/OS X versions of LockFactory for any wierd stuff like that, and let them take timeout as their own parameter (user must deal with it). But this should be a LockFactory impl detail. Maybe someone even wants to make a LockFactory that blocks for some crazy reason (uses fc.lock instead of .tryLock) I didnt try cutting over tests yet or other things, that is more work.
        Hide
        Uwe Schindler added a comment -

        I like your proposal. I already worked with it and implemented the lock factory tester:

        ant test-lock-factory passes successfully.

        I also changed the javadocs a bit and renamed the ValidatingDirectoryWrapper to LockValidatingDirectoryWrapper. We have way too many validating wrappers, so we should have the term "Lock" in the name

        While implementing the lock stress tester, I noticed that it is now very hard to differentiate between an conventional I/O error and failure to obtain the lock. Maybe we should still "preserve" LockObtainFailed Exception. I am not so happy with having no Exception anymore that clearly states that the lock was not successfully obtained (also for users).

        Show
        Uwe Schindler added a comment - I like your proposal. I already worked with it and implemented the lock factory tester: ant test-lock-factory passes successfully. I also changed the javadocs a bit and renamed the ValidatingDirectoryWrapper to LockValidatingDirectoryWrapper. We have way too many validating wrappers, so we should have the term "Lock" in the name While implementing the lock stress tester, I noticed that it is now very hard to differentiate between an conventional I/O error and failure to obtain the lock. Maybe we should still "preserve" LockObtainFailed Exception. I am not so happy with having no Exception anymore that clearly states that the lock was not successfully obtained (also for users).
        Hide
        ASF subversion and git services added a comment -

        Commit 1682421 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682421 ]

        LUCENE-6508: Create branch

        Show
        ASF subversion and git services added a comment - Commit 1682421 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682421 ] LUCENE-6508 : Create branch
        Hide
        ASF subversion and git services added a comment -

        Commit 1682422 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682422 ]

        LUCENE-6508: Initial commit of Robert's and Uwe's code

        Show
        ASF subversion and git services added a comment - Commit 1682422 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682422 ] LUCENE-6508 : Initial commit of Robert's and Uwe's code
        Hide
        ASF subversion and git services added a comment -

        Commit 1682471 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682471 ]

        LUCENE-6508: fix some tests/test-framework and fix stupid bug

        Show
        ASF subversion and git services added a comment - Commit 1682471 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682471 ] LUCENE-6508 : fix some tests/test-framework and fix stupid bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1682474 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682474 ]

        LUCENE-6508: remove double-obtain tests, no longer possible

        Show
        ASF subversion and git services added a comment - Commit 1682474 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682474 ] LUCENE-6508 : remove double-obtain tests, no longer possible
        Hide
        ASF subversion and git services added a comment -

        Commit 1682520 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682520 ]

        LUCENE-6508: fix tests and cleanup

        Show
        ASF subversion and git services added a comment - Commit 1682520 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682520 ] LUCENE-6508 : fix tests and cleanup
        Hide
        ASF subversion and git services added a comment -

        Commit 1682523 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682523 ]

        LUCENE-6508: add back LockObtainFailedException but in a simpler way

        Show
        ASF subversion and git services added a comment - Commit 1682523 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682523 ] LUCENE-6508 : add back LockObtainFailedException but in a simpler way
        Hide
        ASF subversion and git services added a comment -

        Commit 1682525 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682525 ]

        LUCENE-6508: add back LockObtainFailedException for SingleInstance too

        Show
        ASF subversion and git services added a comment - Commit 1682525 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682525 ] LUCENE-6508 : add back LockObtainFailedException for SingleInstance too
        Hide
        ASF subversion and git services added a comment -

        Commit 1682526 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682526 ]

        LUCENE-6508: Make the lock stress tester use new Exception; add Windows-specific Exception to SimpleFSLockFactory

        Show
        ASF subversion and git services added a comment - Commit 1682526 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682526 ] LUCENE-6508 : Make the lock stress tester use new Exception; add Windows-specific Exception to SimpleFSLockFactory
        Hide
        ASF subversion and git services added a comment -

        Commit 1682574 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682574 ]

        LUCENE-6508: get tests passing

        Show
        ASF subversion and git services added a comment - Commit 1682574 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682574 ] LUCENE-6508 : get tests passing
        Hide
        ASF subversion and git services added a comment -

        Commit 1682579 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682579 ]

        LUCENE-6508: some steps towards back compat and tests

        Show
        ASF subversion and git services added a comment - Commit 1682579 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682579 ] LUCENE-6508 : some steps towards back compat and tests
        Hide
        ASF subversion and git services added a comment -

        Commit 1682584 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682584 ]

        LUCENE-6508: more test reorganization

        Show
        ASF subversion and git services added a comment - Commit 1682584 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682584 ] LUCENE-6508 : more test reorganization
        Hide
        ASF subversion and git services added a comment -

        Commit 1682585 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682585 ]

        LUCENE-6508: add more tests

        Show
        ASF subversion and git services added a comment - Commit 1682585 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682585 ] LUCENE-6508 : add more tests
        Hide
        ASF subversion and git services added a comment -

        Commit 1682691 from Michael McCandless in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682691 ]

        LUCENE-6508: fix 'mock directory not closed' test bugs

        Show
        ASF subversion and git services added a comment - Commit 1682691 from Michael McCandless in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682691 ] LUCENE-6508 : fix 'mock directory not closed' test bugs
        Hide
        ASF subversion and git services added a comment -

        Commit 1682697 from Michael McCandless in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682697 ]

        LUCENE-6508: also verify lock on fsync

        Show
        ASF subversion and git services added a comment - Commit 1682697 from Michael McCandless in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682697 ] LUCENE-6508 : also verify lock on fsync
        Hide
        ASF subversion and git services added a comment -

        Commit 1682698 from Michael McCandless in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682698 ]

        LUCENE-6508: remove sop

        Show
        ASF subversion and git services added a comment - Commit 1682698 from Michael McCandless in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682698 ] LUCENE-6508 : remove sop
        Hide
        Michael McCandless added a comment -

        These tests are now angry:

        mike@haswell:/l/simplelocks$ mike@haswell:/l/simplelocks$    [junit4] Suite: org.apache.lucene.store.TestMockDirectoryWrapper
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMockDirectoryWrapper -Dtests.method=testFailIfIndexWriterNotClosed -Dtests.seed=F3213EF45EAF8F1D -Dtests.locale=ar_KW -Dtests.timezone=Asia/Chita -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 0.11s J2 | TestMockDirectoryWrapper.testFailIfIndexWriterNotClosed <<<
           [junit4]    > Throwable #1: java.lang.AssertionError
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F3213EF45EAF8F1D:4472AAC3A3E2A5E8]:0)
           [junit4]    > 	at org.apache.lucene.store.TestMockDirectoryWrapper.testFailIfIndexWriterNotClosed(TestMockDirectoryWrapper.java:55)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestMockDirectoryWrapper -Dtests.method=testFailIfIndexWriterNotClosedChangeLockFactory -Dtests.seed=F3213EF45EAF8F1D -Dtests.locale=ar_KW -Dtests.timezone=Asia/Chita -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 0.02s J2 | TestMockDirectoryWrapper.testFailIfIndexWriterNotClosedChangeLockFactory <<<
           [junit4]    > Throwable #1: java.lang.AssertionError
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F3213EF45EAF8F1D:1980EAD08BE28029]:0)
           [junit4]    > 	at org.apache.lucene.store.TestMockDirectoryWrapper.testFailIfIndexWriterNotClosedChangeLockFactory(TestMockDirectoryWrapper.java:68)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
        

        We are not tracking openLocks anymore in MDW?

        Show
        Michael McCandless added a comment - These tests are now angry: mike@haswell:/l/simplelocks$ mike@haswell:/l/simplelocks$ [junit4] Suite: org.apache.lucene.store.TestMockDirectoryWrapper [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestMockDirectoryWrapper -Dtests.method=testFailIfIndexWriterNotClosed -Dtests.seed=F3213EF45EAF8F1D -Dtests.locale=ar_KW -Dtests.timezone=Asia/Chita -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.11s J2 | TestMockDirectoryWrapper.testFailIfIndexWriterNotClosed <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([F3213EF45EAF8F1D:4472AAC3A3E2A5E8]:0) [junit4] > at org.apache.lucene.store.TestMockDirectoryWrapper.testFailIfIndexWriterNotClosed(TestMockDirectoryWrapper.java:55) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestMockDirectoryWrapper -Dtests.method=testFailIfIndexWriterNotClosedChangeLockFactory -Dtests.seed=F3213EF45EAF8F1D -Dtests.locale=ar_KW -Dtests.timezone=Asia/Chita -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.02s J2 | TestMockDirectoryWrapper.testFailIfIndexWriterNotClosedChangeLockFactory <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([F3213EF45EAF8F1D:1980EAD08BE28029]:0) [junit4] > at org.apache.lucene.store.TestMockDirectoryWrapper.testFailIfIndexWriterNotClosedChangeLockFactory(TestMockDirectoryWrapper.java:68) [junit4] > at java.lang.Thread.run(Thread.java:745) We are not tracking openLocks anymore in MDW?
        Hide
        Michael McCandless added a comment -

        I love the "changed by an external force" exception!

        Nice javadocs catch about now anciently wrong "IndexReader holding a write lock"!

        I wonder if we even need to pass a lock name anymore? Maybe it can
        just be Directory.obtainWriteLock()?

        I love how you implemented the lock timeout (wrapping with a sleeper,
        with awesome "This is not a good idea." javadocs) but I really think
        we should just remove the timeout: I don't see a valid use case
        ... but we can do this separately.

        The new BaseLockFactoryTestCase refactoring is awesome.

        I also love the LockValidatingDirectoryWrapper approach; hopefully the
        additional IO ops per desctructive op is not too costly on
        indexing/NRT reopen latency.

        Show
        Michael McCandless added a comment - I love the "changed by an external force" exception! Nice javadocs catch about now anciently wrong "IndexReader holding a write lock"! I wonder if we even need to pass a lock name anymore? Maybe it can just be Directory.obtainWriteLock()? I love how you implemented the lock timeout (wrapping with a sleeper, with awesome "This is not a good idea." javadocs) but I really think we should just remove the timeout: I don't see a valid use case ... but we can do this separately. The new BaseLockFactoryTestCase refactoring is awesome. I also love the LockValidatingDirectoryWrapper approach; hopefully the additional IO ops per desctructive op is not too costly on indexing/NRT reopen latency.
        Hide
        Robert Muir added a comment -

        We are not tracking openLocks anymore in MDW?

        Right, the AssertingLock in there was horrible. Its gone. We don't need sketchy mocking layers that will hide bugs. Start simple with the minimal stuff is the idea here.

        I love how you implemented the lock timeout (wrapping with a sleeper,
        with awesome "This is not a good idea." javadocs) but I really think
        we should just remove the timeout: I don't see a valid use case
        ... but we can do this separately.

        I don't want complaints about how i broke back compat behavior here, that happened the last time i fixed bugs in locks. Thats why you get a sleeper by default: I know how solr loves its sleeps!

        Show
        Robert Muir added a comment - We are not tracking openLocks anymore in MDW? Right, the AssertingLock in there was horrible. Its gone. We don't need sketchy mocking layers that will hide bugs. Start simple with the minimal stuff is the idea here. I love how you implemented the lock timeout (wrapping with a sleeper, with awesome "This is not a good idea." javadocs) but I really think we should just remove the timeout: I don't see a valid use case ... but we can do this separately. I don't want complaints about how i broke back compat behavior here, that happened the last time i fixed bugs in locks. Thats why you get a sleeper by default: I know how solr loves its sleeps!
        Hide
        Uwe Schindler added a comment - - edited

        I started fixing HdfsLockFactory and Solr:

        • I just need to fix tests for Hdfs, the impl is already there - will commit once I am ready
        • HDFS ensureValid() is not yet implemented, but I will more or less copyPaste from SimpleFS.
        • I had to remove the enforcing unlocker in Solr. I also removed the setting from SolrConfig. With NativeFSLockFactory it cannot happen anymore. Those using SimpleFS or HdFS need to manually unlock by deleting the file. I added a message about that to the SolrException on core startup.
        Show
        Uwe Schindler added a comment - - edited I started fixing HdfsLockFactory and Solr: I just need to fix tests for Hdfs, the impl is already there - will commit once I am ready HDFS ensureValid() is not yet implemented, but I will more or less copyPaste from SimpleFS. I had to remove the enforcing unlocker in Solr. I also removed the setting from SolrConfig. With NativeFSLockFactory it cannot happen anymore. Those using SimpleFS or HdFS need to manually unlock by deleting the file. I added a message about that to the SolrException on core startup.
        Hide
        ASF subversion and git services added a comment -

        Commit 1682732 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682732 ]

        LUCENE-6508: First try to fix Solr and HDFS (completely untested, because I am on Windows)

        Show
        ASF subversion and git services added a comment - Commit 1682732 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682732 ] LUCENE-6508 : First try to fix Solr and HDFS (completely untested, because I am on Windows)
        Hide
        Robert Muir added a comment -

        I wonder if we even need to pass a lock name anymore? Maybe it can
        just be Directory.obtainWriteLock()?

        I thought about this too, but for a number of reasons I dislike it:

        • leave the possibility of other locks for other purposes if we need them.
        • major conceptual "break" for any code using it today.
        • makes backwards compat/5.x impossible.

        On the other hand, with the current situation, i could see this stuff in a 5.3 release. The breaks are contained, locking is marked lucene.internal already anyway. And we could even add more back compat if we really really need it (e.g. add back Dir.makeLock deprecated with some wrapper over the now simpler api).

        I think its important that if make this kind of thing better, we get it out there... I made a lot of compromises to do this! I added back the specialized exceptions, I added sleepinglockfactory, added back IWconfig timeout, even enabled that by default (we can discuss somewhere). I even added back the horrible IndexWriter.isLocked, with big scary warnings:

           * @deprecated Use of this method can only lead to race conditions. Try
           *             to actually obtain a lock instead.
           */
          @Deprecated
          public static boolean isLocked(Directory directory) throws IOException {
        

        To me its worth the trouble...

        Show
        Robert Muir added a comment - I wonder if we even need to pass a lock name anymore? Maybe it can just be Directory.obtainWriteLock()? I thought about this too, but for a number of reasons I dislike it: leave the possibility of other locks for other purposes if we need them. major conceptual "break" for any code using it today. makes backwards compat/5.x impossible. On the other hand, with the current situation, i could see this stuff in a 5.3 release. The breaks are contained, locking is marked lucene.internal already anyway. And we could even add more back compat if we really really need it (e.g. add back Dir.makeLock deprecated with some wrapper over the now simpler api). I think its important that if make this kind of thing better, we get it out there... I made a lot of compromises to do this! I added back the specialized exceptions, I added sleepinglockfactory, added back IWconfig timeout, even enabled that by default (we can discuss somewhere). I even added back the horrible IndexWriter.isLocked, with big scary warnings: * @deprecated Use of this method can only lead to race conditions. Try * to actually obtain a lock instead. */ @Deprecated public static boolean isLocked(Directory directory) throws IOException { To me its worth the trouble...
        Hide
        ASF subversion and git services added a comment -

        Commit 1682861 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682861 ]

        LUCENE-6508: add back lock validation, correctly, modulo PSDP...

        Show
        ASF subversion and git services added a comment - Commit 1682861 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682861 ] LUCENE-6508 : add back lock validation, correctly, modulo PSDP...
        Hide
        Michael McCandless added a comment -

        I thought about this too, but for a number of reasons I dislike it:

        But locking is already an internal API, so we are free to do this simplification on 5.x?

        We are a search engine not a generic locking platform.

        It doesn't make sense that our locking impl be generic beyond what we need internally ("only one thing can write to a directory at a time").

        Or, maybe:

        leave the possibility of other locks for other purposes if we need them.

        ...you could shed light on what future locking needs Lucene might have? I feel like we should design for today, especially on a part of Lucene that has had so much trouble/bugs.

        Maybe as a compromise we could fix the API to refuse to make any lock that's not named IndexWriter.WRITE_LOCK_NAME.

        Anyway, I'm OK with leaving the API "generic" (pass any lock name you want) here... this branch already has enough great improvements. We should wrap it up and commit...

        Show
        Michael McCandless added a comment - I thought about this too, but for a number of reasons I dislike it: But locking is already an internal API, so we are free to do this simplification on 5.x? We are a search engine not a generic locking platform. It doesn't make sense that our locking impl be generic beyond what we need internally ("only one thing can write to a directory at a time"). Or, maybe: leave the possibility of other locks for other purposes if we need them. ...you could shed light on what future locking needs Lucene might have? I feel like we should design for today, especially on a part of Lucene that has had so much trouble/bugs. Maybe as a compromise we could fix the API to refuse to make any lock that's not named IndexWriter.WRITE_LOCK_NAME. Anyway, I'm OK with leaving the API "generic" (pass any lock name you want) here... this branch already has enough great improvements. We should wrap it up and commit...
        Hide
        ASF subversion and git services added a comment -

        Commit 1682867 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682867 ]

        LUCENE-6508: fix IFD to use unwrapped dir for commit points, all tests pass...

        Show
        ASF subversion and git services added a comment - Commit 1682867 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682867 ] LUCENE-6508 : fix IFD to use unwrapped dir for commit points, all tests pass...
        Hide
        Robert Muir added a comment -

        ...you could shed light on what future locking needs Lucene might have? I feel like we should design for today, especially on a part of Lucene that has had so much trouble/bugs.

        Well for example, we could have a commit lock again. I'm not pushing for it at all, I'm just saying its a valid option.

        As I explained, I went for the path of least resistance: you know adding some deprecated methods and even keeping defaults the same (like timeout stuff).

        I think this makes the change easier to digest, and as followups to this issue i'd propose:

        • re-evaluate default lock timeout in IWC (this could even be a version-based change or whatever we want to do, but i like a default of zero).
        • eradicate any usage of stuff like IndexWriter.isLocked, which IMO is generally a bug.
        • see if we can add back an AssertingLock and/or add lock mocking to mockfs, but one that doesn't suck or hide bugs.
        Show
        Robert Muir added a comment - ...you could shed light on what future locking needs Lucene might have? I feel like we should design for today, especially on a part of Lucene that has had so much trouble/bugs. Well for example, we could have a commit lock again. I'm not pushing for it at all, I'm just saying its a valid option. As I explained, I went for the path of least resistance: you know adding some deprecated methods and even keeping defaults the same (like timeout stuff). I think this makes the change easier to digest, and as followups to this issue i'd propose: re-evaluate default lock timeout in IWC (this could even be a version-based change or whatever we want to do, but i like a default of zero). eradicate any usage of stuff like IndexWriter.isLocked, which IMO is generally a bug. see if we can add back an AssertingLock and/or add lock mocking to mockfs, but one that doesn't suck or hide bugs.
        Hide
        ASF subversion and git services added a comment -

        Commit 1682870 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682870 ]

        LUCENE-6508: AssertingLock is gone

        Show
        ASF subversion and git services added a comment - Commit 1682870 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682870 ] LUCENE-6508 : AssertingLock is gone
        Hide
        ASF subversion and git services added a comment -

        Commit 1682882 from Robert Muir in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1682882 ]

        LUCENE-6508: don't make this code scary...

        Show
        ASF subversion and git services added a comment - Commit 1682882 from Robert Muir in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1682882 ] LUCENE-6508 : don't make this code scary...
        Hide
        Robert Muir added a comment -

        I made a patch from the branch changes here. (I removed some whitespace noise and added a CHANGES.txt message). I've done testing on linux, windows, and mac.

        Show
        Robert Muir added a comment - I made a patch from the branch changes here. (I removed some whitespace noise and added a CHANGES.txt message). I've done testing on linux, windows, and mac.
        Hide
        Michael McCandless added a comment -

        +1, patch looks great, thanks Rob.

        Show
        Michael McCandless added a comment - +1, patch looks great, thanks Rob.
        Hide
        ASF subversion and git services added a comment -

        Commit 1683073 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1683073 ]

        LUCENE-6508: Add a note that Solr no longer supports unlockOnStartup. Also remove from example configs.

        Show
        ASF subversion and git services added a comment - Commit 1683073 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1683073 ] LUCENE-6508 : Add a note that Solr no longer supports unlockOnStartup. Also remove from example configs.
        Hide
        Uwe Schindler added a comment -

        I just committed some Solr changes (removing unlocking on startup from config files). I also changes the error message that is displayed if index is locked on startup.

        Can you merge that into your patch. I have not seen your changes.txt in the branch. Should I add the Solr part to Solr's changes.txt?

        Otherwise I will review the patch after my talk later after buzzwords.

        Show
        Uwe Schindler added a comment - I just committed some Solr changes (removing unlocking on startup from config files). I also changes the error message that is displayed if index is locked on startup. Can you merge that into your patch. I have not seen your changes.txt in the branch. Should I add the Solr part to Solr's changes.txt? Otherwise I will review the patch after my talk later after buzzwords.
        Hide
        Robert Muir added a comment -

        I have not seen your changes.txt in the branch.

        That is because I added it to the patch (and removed silly whitespace noise). Don't worry, I will take care of this.

        Should I add the Solr part to Solr's changes.txt?

        please!

        Show
        Robert Muir added a comment - I have not seen your changes.txt in the branch. That is because I added it to the patch (and removed silly whitespace noise). Don't worry, I will take care of this. Should I add the Solr part to Solr's changes.txt? please!
        Hide
        ASF subversion and git services added a comment -

        Commit 1683088 from Uwe Schindler in branch 'dev/branches/lucene6508'
        [ https://svn.apache.org/r1683088 ]

        LUCENE-6508: Add changes text for Lucene and Solr

        Show
        ASF subversion and git services added a comment - Commit 1683088 from Uwe Schindler in branch 'dev/branches/lucene6508' [ https://svn.apache.org/r1683088 ] LUCENE-6508 : Add changes text for Lucene and Solr
        Hide
        Uwe Schindler added a comment -

        Hi Robert, I added the changes entries. I also copied the changes extry for lucene from your patch and committed.

        Show
        Uwe Schindler added a comment - Hi Robert, I added the changes entries. I also copied the changes extry for lucene from your patch and committed.
        Hide
        Uwe Schindler added a comment -

        Patch looks OK for me. I did not ran test on windows (only battery power).

        I am not sure if the SleepingLockFactory should be public: What happens if somebody sets lock timeouts in IndexWriterConfig but also wraps the directory...

        Show
        Uwe Schindler added a comment - Patch looks OK for me. I did not ran test on windows (only battery power). I am not sure if the SleepingLockFactory should be public: What happens if somebody sets lock timeouts in IndexWriterConfig but also wraps the directory...
        Hide
        Robert Muir added a comment -

        I can't argue with this. I forgot to make it un-public. My original intention was to remove the timeout stuff in IndexWriterConfig and you just use this Sleeping factory if you want sleeping and retrying.

        But I think a better path is to first look at changing the IndexWriterConfig default to be zero and deprecating it. If someone wants sleeping, we can just have that factory (and it has tests already). But i want all that under a separate issue from this one.

        Show
        Robert Muir added a comment - I can't argue with this. I forgot to make it un-public. My original intention was to remove the timeout stuff in IndexWriterConfig and you just use this Sleeping factory if you want sleeping and retrying. But I think a better path is to first look at changing the IndexWriterConfig default to be zero and deprecating it. If someone wants sleeping, we can just have that factory (and it has tests already). But i want all that under a separate issue from this one.
        Hide
        Uwe Schindler added a comment -

        Ok. That would be my proposal, too. I just had this in mind after I clicked the comment button.

        so +1 to a separate issue and deprecating it in IndexWriterConfig.

        Show
        Uwe Schindler added a comment - Ok. That would be my proposal, too. I just had this in mind after I clicked the comment button. so +1 to a separate issue and deprecating it in IndexWriterConfig.
        Hide
        Robert Muir added a comment -

        Updated patch with Uwe's changes and making SleepingLockFactory package-private.

        Show
        Robert Muir added a comment - Updated patch with Uwe's changes and making SleepingLockFactory package-private.
        Hide
        Uwe Schindler added a comment -

        +1 LGTM

        Show
        Uwe Schindler added a comment - +1 LGTM
        Hide
        ASF subversion and git services added a comment -

        Commit 1683606 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1683606 ]

        LUCENE-6508: Simplify directory/lock API

        Show
        ASF subversion and git services added a comment - Commit 1683606 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1683606 ] LUCENE-6508 : Simplify directory/lock API
        Hide
        ASF subversion and git services added a comment -

        Commit 1683609 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1683609 ]

        LUCENE-6508: Simplify directory/lock API

        Show
        ASF subversion and git services added a comment - Commit 1683609 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683609 ] LUCENE-6508 : Simplify directory/lock API
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release
        Hide
        ASF subversion and git services added a comment -

        Commit 1698200 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1698200 ]

        SOLR-7942: Previously removed unlockOnStartup option (LUCENE-6508) now logs warning if configured, will be an error in 6.0. Also improved error msg if an index is locked on startup

        Show
        ASF subversion and git services added a comment - Commit 1698200 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1698200 ] SOLR-7942 : Previously removed unlockOnStartup option ( LUCENE-6508 ) now logs warning if configured, will be an error in 6.0. Also improved error msg if an index is locked on startup
        Hide
        ASF subversion and git services added a comment -

        Commit 1698203 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1698203 ]

        SOLR-7942: Previously removed unlockOnStartup option (LUCENE-6508) now logs warning if configured, will be an error in 6.0. Also improved error msg if an index is locked on startup (merge r1698200)

        Show
        ASF subversion and git services added a comment - Commit 1698203 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1698203 ] SOLR-7942 : Previously removed unlockOnStartup option ( LUCENE-6508 ) now logs warning if configured, will be an error in 6.0. Also improved error msg if an index is locked on startup (merge r1698200)

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development