Lucene - Core
  1. Lucene - Core
  2. LUCENE-6507

NativeFSLock.close() can invalidate other locks

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      the lock API in Lucene is super trappy since the lock that we return form this API must first be obtained and if we can't obtain it the lock should not be closed since we might ie. close the underlying channel in the NativeLock case which releases all lock for this file on some operating systems. I think the makeLock method should try to obtain and only return a lock if we successfully obtained it. Not sure if it's possible everywhere but we should at least make the documentation clear here.

      1. LUCENE-6507.patch
        21 kB
        Michael McCandless
      2. LUCENE-6507.patch
        19 kB
        Michael McCandless
      3. LUCENE-6507.patch
        18 kB
        Michael McCandless
      4. LUCENE-6507.patch
        17 kB
        Robert Muir
      5. LUCENE-6507.patch
        17 kB
        Robert Muir
      6. LUCENE-6507.patch
        19 kB
        Robert Muir
      7. LUCENE-6507.patch
        14 kB
        Robert Muir
      8. LUCENE-6507.patch
        14 kB
        Robert Muir
      9. LUCENE-6507.patch
        10 kB
        Simon Willnauer
      10. LUCENE-6507.patch
        5 kB
        Robert Muir
      11. LUCENE-6507.patch
        4 kB
        Robert Muir
      12. LUCENE-6507-410x.patch
        28 kB
        Michael McCandless

        Activity

        Hide
        Uwe Schindler added a comment -

        The naming of this method was always trappy. I did not touch that last year when I refactored the lock factories.

        The method is just a factory to create an instance of the Lock class. Maybe it should be called "newLockInstance", then it would be clear what happens.

        Show
        Uwe Schindler added a comment - The naming of this method was always trappy. I did not touch that last year when I refactored the lock factories. The method is just a factory to create an instance of the Lock class. Maybe it should be called "newLockInstance", then it would be clear what happens.
        Hide
        Simon Willnauer added a comment -

        why don't we just hide the obtain call behind it? I mean we can have makeLock() and makeLock(long timeout) and only return the lock once it's obtained to prevent the trappyness?

        Show
        Simon Willnauer added a comment - why don't we just hide the obtain call behind it? I mean we can have makeLock() and makeLock(long timeout) and only return the lock once it's obtained to prevent the trappyness?
        Hide
        Uwe Schindler added a comment - - edited

        Changing the behaviour of this already existing method would be an unexpected change, code outside Lucene would fall over that (like Infinispan directory,...). So we should break hard and invent a new name for the method, especially if we change behaviour. So code that wants to lock a directory fails to compile. I think we have 2 possibilities:

        • newLockInstance() with the current behaviour
        • lockDirectory() or aquireLock() or whatever to do the actual locking, returning the Lock instance

        makeLock() should be removed so existing code fails to compile

        I am sorry that I did not fix that before release of Lucene 5.0. This was on my list but I missed to fix the name or change behaviour.

        This change here should also be reflected in the LockFactory class (not just in directory)...

        Show
        Uwe Schindler added a comment - - edited Changing the behaviour of this already existing method would be an unexpected change, code outside Lucene would fall over that (like Infinispan directory,...). So we should break hard and invent a new name for the method, especially if we change behaviour. So code that wants to lock a directory fails to compile. I think we have 2 possibilities: newLockInstance() with the current behaviour lockDirectory() or aquireLock() or whatever to do the actual locking, returning the Lock instance makeLock() should be removed so existing code fails to compile I am sorry that I did not fix that before release of Lucene 5.0. This was on my list but I missed to fix the name or change behaviour. This change here should also be reflected in the LockFactory class (not just in directory)...
        Hide
        Simon Willnauer added a comment -

        yeah this seems to go into the right direction here IMO. I like acquireLock best and deprecate or delete the newLock method?

        Show
        Simon Willnauer added a comment - yeah this seems to go into the right direction here IMO. I like acquireLock best and deprecate or delete the newLock method?
        Hide
        Uwe Schindler added a comment -

        Yeah. I would like the new behaviour, more. Unfortunately there is some code in IndexWriter currently that gets the Lock instance just to check if it is locked. We have to review this. Maybe we can simplify the whole thing.

        In any case, I will make (ähm aquire) a proposal! Maybe the Backwards Compatibility Policeman has a good solution

        I also found dead code: the abstract class Lock.With class is dead (no longer used). So we should remove.

        Show
        Uwe Schindler added a comment - Yeah. I would like the new behaviour, more. Unfortunately there is some code in IndexWriter currently that gets the Lock instance just to check if it is locked. We have to review this. Maybe we can simplify the whole thing. In any case, I will make (ähm aquire) a proposal! Maybe the Backwards Compatibility Policeman has a good solution I also found dead code: the abstract class Lock.With class is dead (no longer used). So we should remove.
        Hide
        Robert Muir added a comment -

        I disagree with the synopsis. The problem here has nothing to do with Directory or the lock API at all... this is all bugs in NativeFSLockFactory, around this behavior in the JDK:

        On some systems, closing a channel releases all locks held by the Java virtual machine on the underlying file regardless of whether the locks were acquired via that channel or via another channel open on the same file. It is strongly recommended that, within a program, a unique channel be used to acquire all locks on any given file. 
        

        -1 to changing the Directory/lock API, when it will not even fix the problem: even attempting to obtain() means you are screwed.

        To me the correct solution is to fix NativeFSLockFactory to follow the "strong recommendations". Today it already has a global map to try to workaround issues:

        private static final Set<String> LOCK_HELD = Collections.synchronizedSet(new HashSet<String>());
        

        Seems that this is wrong, and needs to be a map with unique file channels as the 'value'.

        Show
        Robert Muir added a comment - I disagree with the synopsis. The problem here has nothing to do with Directory or the lock API at all... this is all bugs in NativeFSLockFactory, around this behavior in the JDK: On some systems, closing a channel releases all locks held by the Java virtual machine on the underlying file regardless of whether the locks were acquired via that channel or via another channel open on the same file. It is strongly recommended that, within a program, a unique channel be used to acquire all locks on any given file. -1 to changing the Directory/lock API, when it will not even fix the problem: even attempting to obtain() means you are screwed. To me the correct solution is to fix NativeFSLockFactory to follow the "strong recommendations". Today it already has a global map to try to workaround issues: private static final Set< String > LOCK_HELD = Collections.synchronizedSet( new HashSet< String >()); Seems that this is wrong, and needs to be a map with unique file channels as the 'value'.
        Hide
        Robert Muir added a comment -

        Also, LockStressTest from our ant task is not exercising the intra-JVM case, it has never failed on these things.

        We should be seeing test failures.

        But just staring at code, you can see the bugs/races:

        // obtain()
        if (obtained == false) { // not successful - clear up and move out
          clearLockHeld(path);                                                  // #1 clear global map
          final FileChannel toClose = channel;
          channel = null;
          IOUtils.closeWhileHandlingException(toClose);       // #2 close channel
        }
        ...
        
        // close()
        try {
          if (lock != null) {
            try {
              lock.release();
              lock = null;
            } finally {
              clearLockHeld(path);                    // #1 clear global map
            }
          }
        } finally {
          IOUtils.close(channel);                    // #2 close channel
          channel = null;
        }
        

        #1 and #2 happen in the wrong order.

        We must close the channel first for the current code to even stand a chance of working.

        IMO this should block the release.

        Show
        Robert Muir added a comment - Also, LockStressTest from our ant task is not exercising the intra-JVM case, it has never failed on these things. We should be seeing test failures. But just staring at code, you can see the bugs/races: // obtain() if (obtained == false ) { // not successful - clear up and move out clearLockHeld(path); // #1 clear global map final FileChannel toClose = channel; channel = null ; IOUtils.closeWhileHandlingException(toClose); // #2 close channel } ... // close() try { if (lock != null ) { try { lock.release(); lock = null ; } finally { clearLockHeld(path); // #1 clear global map } } } finally { IOUtils.close(channel); // #2 close channel channel = null ; } #1 and #2 happen in the wrong order. We must close the channel first for the current code to even stand a chance of working. IMO this should block the release.
        Hide
        Uwe Schindler added a comment -

        Hi Robert,
        I agree, but that is another problem. From reading Simon's issue description, this issue is not about the bug you are talking about.

        The problem described here is the following problem: if you call makeLock() on a directory it just creates a lock instance, but does not actually lock. This is a bit confusing regarding the naming of the method. makeLock() makes you think that this method aquires the lock and returns an instance of the lock. Simon then had the problem that because of the stupid naming, he unlocked the unlocked (not yet locked) lock. This should be a no-op, so the bug may be there.

        This is why my initial response was to rename the stupid named makeLock() to newLockInstance(). makeLock sounds like "this already creates the lock".

        Show
        Uwe Schindler added a comment - Hi Robert, I agree, but that is another problem. From reading Simon's issue description, this issue is not about the bug you are talking about. The problem described here is the following problem: if you call makeLock() on a directory it just creates a lock instance, but does not actually lock. This is a bit confusing regarding the naming of the method. makeLock() makes you think that this method aquires the lock and returns an instance of the lock. Simon then had the problem that because of the stupid naming, he unlocked the unlocked (not yet locked) lock. This should be a no-op, so the bug may be there. This is why my initial response was to rename the stupid named makeLock() to newLockInstance(). makeLock sounds like "this already creates the lock".
        Hide
        Uwe Schindler added a comment -

        In any case, it would be nice to get a reference here to the failed Elasticsearch test or an example of code broken by this, because it looks like Robert and Simon are talking about something completely different, than described here in the issue description. To me the issue description is quite clear: "Directory#makeLock only creates lock instance but does not lock directory."

        The bugs in NativeFSLockFactory should please be moved to a new issue, its completely unrelated to the current issue. Sorry.

        Show
        Uwe Schindler added a comment - In any case, it would be nice to get a reference here to the failed Elasticsearch test or an example of code broken by this, because it looks like Robert and Simon are talking about something completely different, than described here in the issue description. To me the issue description is quite clear: "Directory#makeLock only creates lock instance but does not lock directory." The bugs in NativeFSLockFactory should please be moved to a new issue, its completely unrelated to the current issue. Sorry.
        Hide
        Robert Muir added a comment -

        The main missing thing i see, hopefully it will provoke the fail, is a simple multi-threaded unit test for lockfactories that tests them directly.

        The current stress test with a best chance of provoking bugs here (_testStressLocks) is @Nightly, not sure how much real action it sees... and is more of an integration test, and opens indexwriters and indexreaders. Additionally it uses MockDirectoryWrapper, so there is additional stuff happening, including locks being wrapped with mocks and so on.

        Show
        Robert Muir added a comment - The main missing thing i see, hopefully it will provoke the fail, is a simple multi-threaded unit test for lockfactories that tests them directly. The current stress test with a best chance of provoking bugs here ( _testStressLocks ) is @Nightly, not sure how much real action it sees... and is more of an integration test, and opens indexwriters and indexreaders. Additionally it uses MockDirectoryWrapper, so there is additional stuff happening, including locks being wrapped with mocks and so on.
        Hide
        Robert Muir added a comment -

        Its precisely the same issue Uwe. See Simon's description, and i quote:

        if we can't obtain it the lock should not be closed since we might ie. close the underlying channel in the NativeLock case which releases all lock for this file on some operating systems

        Show
        Robert Muir added a comment - Its precisely the same issue Uwe. See Simon's description, and i quote: if we can't obtain it the lock should not be closed since we might ie. close the underlying channel in the NativeLock case which releases all lock for this file on some operating systems
        Hide
        Simon Willnauer added a comment -

        I agree with rob that there is a bug in this code but honest. Once we opened that channel we have to close it again but that might release another processes lock. so I wonder how we can fix that ?

        Show
        Simon Willnauer added a comment - I agree with rob that there is a bug in this code but honest. Once we opened that channel we have to close it again but that might release another processes lock. so I wonder how we can fix that ?
        Hide
        Uwe Schindler added a comment -

        Robert,
        thats another bug not related to the summary of this issue - its just "mentioned" here in Simon's issue (sorry the issue description is un-understandable). it is not makeLock() that's buggy, it is NativeFSLock#close() thats buggy.

        This issue is about bad naming and documentation of this method and that is how I understood Simon Willnauer.

        Could we please split this issue into 2?:

        • this issue about bad naming: I agree with Simon here
        • a new issue to fix the NativeFSLock#close() method.
        Show
        Uwe Schindler added a comment - Robert, thats another bug not related to the summary of this issue - its just "mentioned" here in Simon's issue (sorry the issue description is un-understandable). it is not makeLock() that's buggy, it is NativeFSLock#close() thats buggy. This issue is about bad naming and documentation of this method and that is how I understood Simon Willnauer . Could we please split this issue into 2?: this issue about bad naming: I agree with Simon here a new issue to fix the NativeFSLock#close() method.
        Hide
        Simon Willnauer added a comment -

        oh nevermind - this is only within the same JVM ....

        Show
        Simon Willnauer added a comment - oh nevermind - this is only within the same JVM ....
        Hide
        Robert Muir added a comment -

        The separate process is not an issue. see the javadocs for FileLock again, its for the given java virtual machine.

        NativeFSLockFactory is buggy within the same JVM, that needs to be fixed here. Then close() has no crazy side effects.

        Show
        Robert Muir added a comment - The separate process is not an issue. see the javadocs for FileLock again, its for the given java virtual machine. NativeFSLockFactory is buggy within the same JVM, that needs to be fixed here. Then close() has no crazy side effects.
        Hide
        Uwe Schindler added a comment -

        As you both don't want to split this issue, I updated the summary/title.

        Show
        Uwe Schindler added a comment - As you both don't want to split this issue, I updated the summary/title.
        Hide
        Robert Muir added a comment -

        Well, I think we should try to simplify the API, as always. But to me that is polishing the silverware when the house is burning down. We need to fix the bugs first.

        As far as simplifying the API, it would be great if it was just one single method that returned an obtained-lock, and then these locks wouldn't need any mutable state. I'm not sure if all the crazy methods we have are really needed, we should see what is the minimal thing we can get away with.

        Show
        Robert Muir added a comment - Well, I think we should try to simplify the API, as always. But to me that is polishing the silverware when the house is burning down. We need to fix the bugs first. As far as simplifying the API, it would be great if it was just one single method that returned an obtained-lock, and then these locks wouldn't need any mutable state. I'm not sure if all the crazy methods we have are really needed, we should see what is the minimal thing we can get away with.
        Hide
        Robert Muir added a comment -

        Here is a patch fixing the bugs i could see.

        Existing tests pass but we should try to make one that fails without the fixes.

        Show
        Robert Muir added a comment - Here is a patch fixing the bugs i could see. Existing tests pass but we should try to make one that fails without the fixes.
        Hide
        Robert Muir added a comment -

        I still don't like that clearLockHeld calls toRealPath() which can throw an exception / does IO. And we already do this canonicalization to toRealPath in obtain() which is synced, so i think we can just save it as an instance variable and simplify this further. I will look into this.

        Show
        Robert Muir added a comment - I still don't like that clearLockHeld calls toRealPath() which can throw an exception / does IO. And we already do this canonicalization to toRealPath in obtain() which is synced, so i think we can just save it as an instance variable and simplify this further. I will look into this.
        Hide
        Robert Muir added a comment -

        updated patch to avoid calling toRealPath() in this map-clearing.

        Show
        Robert Muir added a comment - updated patch to avoid calling toRealPath() in this map-clearing.
        Hide
        Simon Willnauer added a comment -

        here is a patch that reproduces the problem on linux. I also fixed several other lock impls that relied on the fact that we dont' close if obtain fails. Yet, I didn't fix NativeFSLock yet.

        Show
        Simon Willnauer added a comment - here is a patch that reproduces the problem on linux. I also fixed several other lock impls that relied on the fact that we dont' close if obtain fails. Yet, I didn't fix NativeFSLock yet.
        Hide
        Robert Muir added a comment -

        Thanks Simon. Attached is a combined patch (mine + simon).

        Without the fix, simon's test triggers the NativeFSLockFactory bug on linux, tripping system assertions in sun.nio.ch.SharedFileLockTable. It passes with the patch.

           [junit4]   2> Caused by: java.lang.AssertionError
           [junit4]   2> 	at sun.nio.ch.SharedFileLockTable.removeKeyIfEmpty(FileLockTable.java:167)
           [junit4]   2> 	at sun.nio.ch.SharedFileLockTable.removeAll(FileLockTable.java:222)
           [junit4]   2> 	at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:118)
           [junit4]   2> 	... 23 more
           [junit4]   2> 
        

        The other small fixes look good to me as well. I will test this patch on windows and mac but I think we are in good shape.

        Show
        Robert Muir added a comment - Thanks Simon. Attached is a combined patch (mine + simon). Without the fix, simon's test triggers the NativeFSLockFactory bug on linux, tripping system assertions in sun.nio.ch.SharedFileLockTable. It passes with the patch. [junit4] 2> Caused by: java.lang.AssertionError [junit4] 2> at sun.nio.ch.SharedFileLockTable.removeKeyIfEmpty(FileLockTable.java:167) [junit4] 2> at sun.nio.ch.SharedFileLockTable.removeAll(FileLockTable.java:222) [junit4] 2> at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:118) [junit4] 2> ... 23 more [junit4] 2> The other small fixes look good to me as well. I will test this patch on windows and mac but I think we are in good shape.
        Hide
        Robert Muir added a comment -

        I opened a separate issue for API discussion: LUCENE-6508

        And i really agree that should happen, but this is a serious bug we need to fix.

        Show
        Robert Muir added a comment - I opened a separate issue for API discussion: LUCENE-6508 And i really agree that should happen, but this is a serious bug we need to fix.
        Hide
        Robert Muir added a comment -

        Updated patch. Fixes a test bug in AssertingLock, it can't assert on close either, for the NoLockFactory case.

        e.g. TestIndexWriter.testNoSegmentFile use this intentionally to create more than one IW on the same dir... this should be cleaned up later, I don't understand why tests need to do that.

        Show
        Robert Muir added a comment - Updated patch. Fixes a test bug in AssertingLock, it can't assert on close either, for the NoLockFactory case. e.g. TestIndexWriter.testNoSegmentFile use this intentionally to create more than one IW on the same dir... this should be cleaned up later, I don't understand why tests need to do that.
        Hide
        Robert Muir added a comment -

        Updated patch, fixing SimpleFSLock.close() to be safe as well.

        It passed before because tests never use SimpleFSLock. I fixed them to randomize between Native and Simple in all tests.

        This uncovers some new stuff, like this:

           [junit4] Suite: org.apache.lucene.index.TestCrashCausesCorruptIndex
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCrashCausesCorruptIndex -Dtests.method=testCrashCorruptsIndexing -Dtests.seed=8DBA6BF430E29A48 -Dtests.locale=no_NO_NY -Dtests.timezone=SystemV/PST8 -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] ERROR   1.37s | TestCrashCausesCorruptIndex.testCrashCorruptsIndexing <<<
           [junit4]    > Throwable #1: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.MockDirectoryWrapper$AssertingLock@17e1509f
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([8DBA6BF430E29A48:FA3DFBC572D35AA3]:0)
           [junit4]    > 	at org.apache.lucene.store.Lock.obtain(Lock.java:89)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:775)
           [junit4]    > 	at org.apache.lucene.index.TestCrashCausesCorruptIndex.indexAfterRestart(TestCrashCausesCorruptIndex.java:98)
           [junit4]    > 	at org.apache.lucene.index.TestCrashCausesCorruptIndex.testCrashCorruptsIndexing(TestCrashCausesCorruptIndex.java:50)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: leaving temporary files on disk at: /home/rmuir/workspace/trunk-iw/lucene/build/core/test/J0/temp/lucene.index.TestCrashCausesCorruptIndex 8DBA6BF430E29A48-001
           [junit4]   2> NOTE: test params are: codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=32683, maxDocsPerChunk=8, blockSize=6), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=32683, blockSize=6)), sim=RandomSimilarityProvider(queryNorm=false,coord=crazy): {text=DFR GL1}, locale=no_NO_NY, timezone=SystemV/PST8
           [junit4]   2> NOTE: Linux 3.13.0-49-generic amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=8,threads=1,free=208117928,total=253231104
           [junit4]   2> NOTE: All tests run in this JVM: [TestCrashCausesCorruptIndex]
           [junit4] Completed [1/1] in 1.55s, 1 test, 1 error <<< FAILURES!
        

        I have not looked into that one yet.

        Show
        Robert Muir added a comment - Updated patch, fixing SimpleFSLock.close() to be safe as well. It passed before because tests never use SimpleFSLock. I fixed them to randomize between Native and Simple in all tests. This uncovers some new stuff, like this: [junit4] Suite: org.apache.lucene.index.TestCrashCausesCorruptIndex [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCrashCausesCorruptIndex -Dtests.method=testCrashCorruptsIndexing -Dtests.seed=8DBA6BF430E29A48 -Dtests.locale=no_NO_NY -Dtests.timezone=SystemV/PST8 -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 1.37s | TestCrashCausesCorruptIndex.testCrashCorruptsIndexing <<< [junit4] > Throwable #1: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.MockDirectoryWrapper$AssertingLock@17e1509f [junit4] > at __randomizedtesting.SeedInfo.seed([8DBA6BF430E29A48:FA3DFBC572D35AA3]:0) [junit4] > at org.apache.lucene.store.Lock.obtain(Lock.java:89) [junit4] > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:775) [junit4] > at org.apache.lucene.index.TestCrashCausesCorruptIndex.indexAfterRestart(TestCrashCausesCorruptIndex.java:98) [junit4] > at org.apache.lucene.index.TestCrashCausesCorruptIndex.testCrashCorruptsIndexing(TestCrashCausesCorruptIndex.java:50) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /home/rmuir/workspace/trunk-iw/lucene/build/core/test/J0/temp/lucene.index.TestCrashCausesCorruptIndex 8DBA6BF430E29A48-001 [junit4] 2> NOTE: test params are: codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=32683, maxDocsPerChunk=8, blockSize=6), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=32683, blockSize=6)), sim=RandomSimilarityProvider(queryNorm=false,coord=crazy): {text=DFR GL1}, locale=no_NO_NY, timezone=SystemV/PST8 [junit4] 2> NOTE: Linux 3.13.0-49-generic amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=8,threads=1,free=208117928,total=253231104 [junit4] 2> NOTE: All tests run in this JVM: [TestCrashCausesCorruptIndex] [junit4] Completed [1/1] in 1.55s, 1 test, 1 error <<< FAILURES! I have not looked into that one yet.
        Hide
        Robert Muir added a comment -

        The last fail i understand now. its a test bug, because i randomized lockfactories in newDirectory().

        This test calls newDirectory() over the same existing index, but switches up lockfactories. Native doesn't delete its file (for obvious reasons), but simple depends on the existence of the file, so it times out.

        I will fix the patch...

        Show
        Robert Muir added a comment - The last fail i understand now. its a test bug, because i randomized lockfactories in newDirectory(). This test calls newDirectory() over the same existing index, but switches up lockfactories. Native doesn't delete its file (for obvious reasons), but simple depends on the existence of the file, so it times out. I will fix the patch...
        Hide
        Robert Muir added a comment -

        Here is a simpler approach for this bugfix. Don't randomize lockfactory in all tests, just in simon's new one.

        Defer randomization across all tests to https://issues.apache.org/jira/browse/LUCENE-6509, where we can prevent test bugs.

        I don't want to totally destabilize tests in this way right now.

        Show
        Robert Muir added a comment - Here is a simpler approach for this bugfix. Don't randomize lockfactory in all tests, just in simon's new one. Defer randomization across all tests to https://issues.apache.org/jira/browse/LUCENE-6509 , where we can prevent test bugs. I don't want to totally destabilize tests in this way right now.
        Hide
        Uwe Schindler added a comment - - edited

        Hi Robert, thanks for fixing and opening the new issues. Sorry for the confusing discussion today. This was actually 2 bugs: an API naming inconsistency and the mutable Locks in combination with the broken clone() behaviour on the Lock instance.

        I will look into your fixes and test them, but until the locks get immutable (I will work on this, for sure, I just have to prepare my talk for next week) this is a good fix. Thanks.

        Show
        Uwe Schindler added a comment - - edited Hi Robert, thanks for fixing and opening the new issues. Sorry for the confusing discussion today. This was actually 2 bugs: an API naming inconsistency and the mutable Locks in combination with the broken clone() behaviour on the Lock instance. I will look into your fixes and test them, but until the locks get immutable (I will work on this, for sure, I just have to prepare my talk for next week) this is a good fix. Thanks.
        Hide
        Robert Muir added a comment -

        I tested the latest patch on linux, mac, and windows. Mike also helped with some distributed beasting of tests. I think its ready.

        Show
        Robert Muir added a comment - I tested the latest patch on linux, mac, and windows. Mike also helped with some distributed beasting of tests. I think its ready.
        Hide
        Robert Muir added a comment -

        Updated patch removing my changes to SimpleFSLockFactory.isLocked()

        I didn't mean to change the semantics for this totally unnecessary method (unused by lucene).

        Of course, no tests fail either way, and this is bogus unnecessary stuff in our locking api.

        Its a search engine library, not a filelocking library.. IndexWriter.isLocked needs to die, like, as fast as possible, as well as Lock.isLocked.

        We cant even get the basics right, i dont know why we have stupid methods like this.

        Show
        Robert Muir added a comment - Updated patch removing my changes to SimpleFSLockFactory.isLocked() I didn't mean to change the semantics for this totally unnecessary method (unused by lucene). Of course, no tests fail either way, and this is bogus unnecessary stuff in our locking api. Its a search engine library, not a filelocking library.. IndexWriter.isLocked needs to die, like, as fast as possible, as well as Lock.isLocked. We cant even get the basics right, i dont know why we have stupid methods like this.
        Hide
        Michael McCandless added a comment -

        114 iterations of all Lucene core+module tests and no failures ...

        Show
        Michael McCandless added a comment - 114 iterations of all Lucene core+module tests and no failures ...
        Hide
        Michael McCandless added a comment -

        New patch, fixing SingleInstanceLF to not set obtained to false if you try to obtain it twice, plus a failing test.

        I also had to fix/relax MockDirectoryWrapper.AssertingLock's behavior if you call .obtain twice on a single lock ... it was clearing its obtained member, but I don't think it should.

        Show
        Michael McCandless added a comment - New patch, fixing SingleInstanceLF to not set obtained to false if you try to obtain it twice, plus a failing test. I also had to fix/relax MockDirectoryWrapper.AssertingLock's behavior if you call .obtain twice on a single lock ... it was clearing its obtained member, but I don't think it should.
        Hide
        Robert Muir added a comment -

        I also had to fix/relax MockDirectoryWrapper.AssertingLock's behavior if you call .obtain twice on a single lock ... it was clearing its obtained member, but I don't think it should.

        IMO we should deliver an exception if you do this. There is no need for leniency that returns false.

        Show
        Robert Muir added a comment - I also had to fix/relax MockDirectoryWrapper.AssertingLock's behavior if you call .obtain twice on a single lock ... it was clearing its obtained member, but I don't think it should. IMO we should deliver an exception if you do this. There is no need for leniency that returns false.
        Hide
        Michael McCandless added a comment -

        IMO we should deliver an exception if you do this.

        Good idea, I changed it to throw LockObtainFailedExc if you (stupidly) try to call .obtain twice on a single instance, and added test cases for the 3 core LockFactory impls (minus NoLockFactory).

        Show
        Michael McCandless added a comment - IMO we should deliver an exception if you do this. Good idea, I changed it to throw LockObtainFailedExc if you (stupidly) try to call .obtain twice on a single instance, and added test cases for the 3 core LockFactory impls (minus NoLockFactory).
        Hide
        Michael McCandless added a comment -

        Another iteration:

        • Also throw exc on double obtain to HdfsLockFactory
        • Put back accidental test change in my last patch
        • Other minor cleanups

        I think patch is ready; that's a good catch in VerifyingLockFactory: it should NOT be trusting the LockFactory's isLocked impl...

        Show
        Michael McCandless added a comment - Another iteration: Also throw exc on double obtain to HdfsLockFactory Put back accidental test change in my last patch Other minor cleanups I think patch is ready; that's a good catch in VerifyingLockFactory: it should NOT be trusting the LockFactory's isLocked impl...
        Hide
        Uwe Schindler added a comment -

        +1 much better

        Show
        Uwe Schindler added a comment - +1 much better
        Hide
        Robert Muir added a comment -

        Thanks for the additional cleanups mike! +1 from me.

        Show
        Robert Muir added a comment - Thanks for the additional cleanups mike! +1 from me.
        Hide
        Michael McCandless added a comment -

        Thanks guys, I'll commit & backport...

        Show
        Michael McCandless added a comment - Thanks guys, I'll commit & backport...
        Hide
        ASF subversion and git services added a comment -

        Commit 1682327 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1682327 ]

        LUCENE-6507: don't let NativeFSLock.close release other locks

        Show
        ASF subversion and git services added a comment - Commit 1682327 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1682327 ] LUCENE-6507 : don't let NativeFSLock.close release other locks
        Hide
        ASF subversion and git services added a comment -

        Commit 1682329 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682329 ]

        LUCENE-6507: don't let NativeFSLock.close release other locks

        Show
        ASF subversion and git services added a comment - Commit 1682329 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682329 ] LUCENE-6507 : don't let NativeFSLock.close release other locks
        Hide
        ASF subversion and git services added a comment -

        Commit 1682335 from Michael McCandless in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1682335 ]

        LUCENE-6507: don't let NativeFSLock.close release other locks

        Show
        ASF subversion and git services added a comment - Commit 1682335 from Michael McCandless in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1682335 ] LUCENE-6507 : don't let NativeFSLock.close release other locks
        Hide
        Steve Rowe added a comment -

        I see an HdfsLockFactoryTest failure on 5.2 after this commit: http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.2-Java8/3/.

          [junit4] Suite: org.apache.solr.store.hdfs.HdfsLockFactoryTest
          [junit4]   2> Creating dataDir: /var/lib/jenkins/jobs/Lucene-Solr-tests-5.2-Java8/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsLockFactoryTest B48BC404BF6BB3F1-001/init-core-data-001
          [junit4]   2> 123149 T2061 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (false) and clientAuth (false)
          [junit4]   2> 124356 T2061 oahu.NativeCodeLoader.<clinit> WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          [junit4]   1> Formatting using clusterid: testClusterID
          [junit4]   2> 125343 T2061 oahmi.MetricsConfig.loadFirst WARN Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties
          [junit4]   2> 125705 T2061 oml.Slf4jLog.info Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
          [junit4]   2> 125714 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j
          [junit4]   2> 126082 T2061 oml.Slf4jLog.info jetty-6.1.26
          [junit4]   2> 126200 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs to ./temp/Jetty_localhost_45038_hdfs____jfnzfi/webapp
          [junit4]   2> 126577 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet
          [junit4]   2> 127704 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45038
          [junit4]   2> 129764 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j
          [junit4]   2> 129777 T2061 oml.Slf4jLog.info jetty-6.1.26
          [junit4]   2> 129841 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_60765_datanode____fxr31f/webapp
          [junit4]   2> 130228 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet
          [junit4]   2> 131028 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:60765
          [junit4]   2> 131767 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j
          [junit4]   2> 131769 T2061 oml.Slf4jLog.info jetty-6.1.26
          [junit4]   2> 131799 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_45594_datanode____xb8eu/webapp
          [junit4]   2> 132058 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet
          [junit4]   2> 132856 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45594
          [junit4]   2> 135493 T2088 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-c7286d21-0c75-425a-b32a-cda888b89811 node DatanodeRegistration(127.0.0.1, datanodeUuid=183b04a7-dc21-4d3e-ad8c-52569c645c0d, infoPort=60765, ipcPort=36559, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: true, processing time: 2 msecs
          [junit4]   2> 135501 T2088 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-2cbc54c3-5182-44c8-b80a-a4af3d3bea02 node DatanodeRegistration(127.0.0.1, datanodeUuid=183b04a7-dc21-4d3e-ad8c-52569c645c0d, infoPort=60765, ipcPort=36559, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs
          [junit4]   2> 135493 T2097 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-3c1bd938-8f62-4608-a6d4-63f576e970cd node DatanodeRegistration(127.0.0.1, datanodeUuid=3e441a70-defc-4b2f-bf7f-95c351d97a39, infoPort=45594, ipcPort=58299, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: true, processing time: 1 msecs
          [junit4]   2> 135511 T2097 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-7b0f0027-583c-462f-9b9c-e6f13ca8a160 node DatanodeRegistration(127.0.0.1, datanodeUuid=3e441a70-defc-4b2f-bf7f-95c351d97a39, infoPort=45594, ipcPort=58299, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs
          [junit4]   2> 135786 T2061 oas.SolrTestCaseJ4.setUp ###Starting testBasic
          [junit4]   2> 135956 T2061 oassh.HdfsDirectory.<init> WARN The NameNode is in SafeMode - Solr will wait 5 seconds and try again.
          [junit4]   2> 141161 T2061 oas.SolrTestCaseJ4.tearDown ###Ending testBasic
          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HdfsLockFactoryTest -Dtests.method=testBasic -Dtests.seed=B48BC404BF6BB3F1 -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Antarctica/Vostok -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
          [junit4] ERROR   5.48s J5  | HdfsLockFactoryTest.testBasic <<<
          [junit4]    > Throwable #1: org.apache.lucene.store.LockObtainFailedException: this lock instance was already obtained
          [junit4]    > 	at __randomizedtesting.SeedInfo.seed([B48BC404BF6BB3F1:1F71D91160B735DF]:0)
          [junit4]    > 	at org.apache.solr.store.hdfs.HdfsLockFactory$HdfsLock.obtain(HdfsLockFactory.java:71)
          [junit4]    > 	at org.apache.solr.store.hdfs.HdfsLockFactoryTest.testBasic(HdfsLockFactoryTest.java:75)
          [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          [junit4]   2> 141271 T2061 oas.SolrTestCaseJ4.setUp ###Starting testDoubleObtain
          [junit4]   2> 141393 T2061 oassh.HdfsDirectory.close Closing hdfs directory hdfs://localhost:40796/basedir/lock
          [junit4]   2> 141394 T2061 oas.SolrTestCaseJ4.tearDown ###Ending testDoubleObtain
          [junit4]   2> 141394 T2061 oahhsd.DirectoryScanner.shutdown WARN DirectoryScanner: shutdown has been called
          [junit4]   2> 141435 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0
          [junit4]   2> 141436 T2133 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false. Rechecking.
          [junit4]   2> 141437 T2133 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false
          [junit4]   2> 141538 T2143 oahhsd.BPServiceActor.offerService WARN BPOfferService for Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 3e441a70-defc-4b2f-bf7f-95c351d97a39) service to localhost/127.0.0.1:40796 interrupted
          [junit4]   2> 141538 T2143 oahhsd.BPServiceActor.run WARN Ending block pool service for: Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 3e441a70-defc-4b2f-bf7f-95c351d97a39) service to localhost/127.0.0.1:40796
          [junit4]   2> 141542 T2061 oahhsd.DirectoryScanner.shutdown WARN DirectoryScanner: shutdown has been called
          [junit4]   2> 141559 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0
          [junit4]   2> 141559 T2109 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false. Rechecking.
          [junit4]   2> 141571 T2109 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false
          [junit4]   2> 141580 T2116 oahhsd.BPServiceActor.offerService WARN BPOfferService for Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 183b04a7-dc21-4d3e-ad8c-52569c645c0d) service to localhost/127.0.0.1:40796 interrupted
          [junit4]   2> 141588 T2116 oahhsd.BPServiceActor.run WARN Ending block pool service for: Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 183b04a7-dc21-4d3e-ad8c-52569c645c0d) service to localhost/127.0.0.1:40796
          [junit4]   2> 141598 T2087 oahhsb.DecommissionManager$Monitor.run WARN Monitor interrupted: java.lang.InterruptedException: sleep interrupted
          [junit4]   2> 141616 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0
          [junit4]   2> 141722 T2061 oahml.MethodMetric$2.snapshot ERROR Error invoking method getBlocksTotal java.lang.reflect.InvocationTargetException
          [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:497)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
          [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804)
          [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595)
          [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813)
          [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430)
          [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
          [junit4]   2> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
          [junit4]   2> 	at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:227)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:212)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:461)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:212)
          [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
          [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
          [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:145)
          [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:822)
          [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1720)
          [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1699)
          [junit4]   2> 	at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:197)
          [junit4]   2> 	at org.apache.solr.store.hdfs.HdfsLockFactoryTest.afterClass(HdfsLockFactoryTest.java:52)
          [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:497)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
          [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
          [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
          [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
          [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
          [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
          [junit4]   2> Caused by: java.lang.NullPointerException
          [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:198)
          [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3291)
          [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:6223)
          [junit4]   2> 	... 54 more
          [junit4]   2> 
          [junit4]   2> 141730 T2061 oas.SolrTestCaseJ4.deleteCore ###deleteCore
          [junit4]   2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-tests-5.2-Java8/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsLockFactoryTest B48BC404BF6BB3F1-001
          [junit4]   2> 18586 T2060 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s).
          [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=DefaultSimilarity, locale=de_GR, timezone=Antarctica/Vostok
          [junit4]   2> NOTE: Linux 4.0.2 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=3,free=274346960,total=521142272
          [junit4]   2> NOTE: All tests run in this JVM: [TestDistributedMissingSort, TestRandomMergePolicy, TestSolrConfigHandlerCloud, TestSweetSpotSimilarityFactory, TestSolrJ, TestStressLucene, TestBinaryResponseWriter, TestIndexSearcher, HdfsLockFactoryTest]
          [junit4] Completed [164/497] on J5 in 28.44s, 2 tests, 1 error <<< FAILURES!
        
        Show
        Steve Rowe added a comment - I see an HdfsLockFactoryTest failure on 5.2 after this commit: http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.2-Java8/3/ . [junit4] Suite: org.apache.solr.store.hdfs.HdfsLockFactoryTest [junit4] 2> Creating dataDir: /var/lib/jenkins/jobs/Lucene-Solr-tests-5.2-Java8/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsLockFactoryTest B48BC404BF6BB3F1-001/init-core-data-001 [junit4] 2> 123149 T2061 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (false) and clientAuth (false) [junit4] 2> 124356 T2061 oahu.NativeCodeLoader.<clinit> WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [junit4] 1> Formatting using clusterid: testClusterID [junit4] 2> 125343 T2061 oahmi.MetricsConfig.loadFirst WARN Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties [junit4] 2> 125705 T2061 oml.Slf4jLog.info Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [junit4] 2> 125714 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j [junit4] 2> 126082 T2061 oml.Slf4jLog.info jetty-6.1.26 [junit4] 2> 126200 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs to ./temp/Jetty_localhost_45038_hdfs____jfnzfi/webapp [junit4] 2> 126577 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 127704 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45038 [junit4] 2> 129764 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j [junit4] 2> 129777 T2061 oml.Slf4jLog.info jetty-6.1.26 [junit4] 2> 129841 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_60765_datanode____fxr31f/webapp [junit4] 2> 130228 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 131028 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:60765 [junit4] 2> 131767 T2061 oahh.HttpRequestLog.getRequestLog WARN Jetty request log can only be enabled using Log4j [junit4] 2> 131769 T2061 oml.Slf4jLog.info jetty-6.1.26 [junit4] 2> 131799 T2061 oml.Slf4jLog.info Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_45594_datanode____xb8eu/webapp [junit4] 2> 132058 T2061 oml.Slf4jLog.info NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 132856 T2061 oml.Slf4jLog.info Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45594 [junit4] 2> 135493 T2088 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-c7286d21-0c75-425a-b32a-cda888b89811 node DatanodeRegistration(127.0.0.1, datanodeUuid=183b04a7-dc21-4d3e-ad8c-52569c645c0d, infoPort=60765, ipcPort=36559, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: true, processing time: 2 msecs [junit4] 2> 135501 T2088 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-2cbc54c3-5182-44c8-b80a-a4af3d3bea02 node DatanodeRegistration(127.0.0.1, datanodeUuid=183b04a7-dc21-4d3e-ad8c-52569c645c0d, infoPort=60765, ipcPort=36559, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs [junit4] 2> 135493 T2097 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-3c1bd938-8f62-4608-a6d4-63f576e970cd node DatanodeRegistration(127.0.0.1, datanodeUuid=3e441a70-defc-4b2f-bf7f-95c351d97a39, infoPort=45594, ipcPort=58299, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: true, processing time: 1 msecs [junit4] 2> 135511 T2097 oahhsb.BlockManager.processReport BLOCK* processReport: from storage DS-7b0f0027-583c-462f-9b9c-e6f13ca8a160 node DatanodeRegistration(127.0.0.1, datanodeUuid=3e441a70-defc-4b2f-bf7f-95c351d97a39, infoPort=45594, ipcPort=58299, storageInfo=lv=-56;cid=testClusterID;nsid=349984326;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs [junit4] 2> 135786 T2061 oas.SolrTestCaseJ4.setUp ###Starting testBasic [junit4] 2> 135956 T2061 oassh.HdfsDirectory.<init> WARN The NameNode is in SafeMode - Solr will wait 5 seconds and try again. [junit4] 2> 141161 T2061 oas.SolrTestCaseJ4.tearDown ###Ending testBasic [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HdfsLockFactoryTest -Dtests.method=testBasic -Dtests.seed=B48BC404BF6BB3F1 -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Antarctica/Vostok -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 5.48s J5 | HdfsLockFactoryTest.testBasic <<< [junit4] > Throwable #1: org.apache.lucene.store.LockObtainFailedException: this lock instance was already obtained [junit4] > at __randomizedtesting.SeedInfo.seed([B48BC404BF6BB3F1:1F71D91160B735DF]:0) [junit4] > at org.apache.solr.store.hdfs.HdfsLockFactory$HdfsLock.obtain(HdfsLockFactory.java:71) [junit4] > at org.apache.solr.store.hdfs.HdfsLockFactoryTest.testBasic(HdfsLockFactoryTest.java:75) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> 141271 T2061 oas.SolrTestCaseJ4.setUp ###Starting testDoubleObtain [junit4] 2> 141393 T2061 oassh.HdfsDirectory.close Closing hdfs directory hdfs://localhost:40796/basedir/lock [junit4] 2> 141394 T2061 oas.SolrTestCaseJ4.tearDown ###Ending testDoubleObtain [junit4] 2> 141394 T2061 oahhsd.DirectoryScanner.shutdown WARN DirectoryScanner: shutdown has been called [junit4] 2> 141435 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0 [junit4] 2> 141436 T2133 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false. Rechecking. [junit4] 2> 141437 T2133 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false [junit4] 2> 141538 T2143 oahhsd.BPServiceActor.offerService WARN BPOfferService for Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 3e441a70-defc-4b2f-bf7f-95c351d97a39) service to localhost/127.0.0.1:40796 interrupted [junit4] 2> 141538 T2143 oahhsd.BPServiceActor.run WARN Ending block pool service for: Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 3e441a70-defc-4b2f-bf7f-95c351d97a39) service to localhost/127.0.0.1:40796 [junit4] 2> 141542 T2061 oahhsd.DirectoryScanner.shutdown WARN DirectoryScanner: shutdown has been called [junit4] 2> 141559 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0 [junit4] 2> 141559 T2109 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false. Rechecking. [junit4] 2> 141571 T2109 oahh.HttpServer2$SelectChannelConnectorWithSafeStartup.isRunning WARN HttpServer Acceptor: isRunning is false [junit4] 2> 141580 T2116 oahhsd.BPServiceActor.offerService WARN BPOfferService for Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 183b04a7-dc21-4d3e-ad8c-52569c645c0d) service to localhost/127.0.0.1:40796 interrupted [junit4] 2> 141588 T2116 oahhsd.BPServiceActor.run WARN Ending block pool service for: Block pool BP-1715631016-127.0.1.1-1432850493139 (Datanode Uuid 183b04a7-dc21-4d3e-ad8c-52569c645c0d) service to localhost/127.0.0.1:40796 [junit4] 2> 141598 T2087 oahhsb.DecommissionManager$Monitor.run WARN Monitor interrupted: java.lang.InterruptedException: sleep interrupted [junit4] 2> 141616 T2061 oml.Slf4jLog.info Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0 [junit4] 2> 141722 T2061 oahml.MethodMetric$2.snapshot ERROR Error invoking method getBlocksTotal java.lang.reflect.InvocationTargetException [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:497) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) [junit4] 2> at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) [junit4] 2> at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:227) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:212) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:461) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:212) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:145) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:822) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1720) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1699) [junit4] 2> at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:197) [junit4] 2> at org.apache.solr.store.hdfs.HdfsLockFactoryTest.afterClass(HdfsLockFactoryTest.java:52) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:497) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:198) [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3291) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:6223) [junit4] 2> ... 54 more [junit4] 2> [junit4] 2> 141730 T2061 oas.SolrTestCaseJ4.deleteCore ###deleteCore [junit4] 2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-tests-5.2-Java8/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsLockFactoryTest B48BC404BF6BB3F1-001 [junit4] 2> 18586 T2060 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=DefaultSimilarity, locale=de_GR, timezone=Antarctica/Vostok [junit4] 2> NOTE: Linux 4.0.2 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=3,free=274346960,total=521142272 [junit4] 2> NOTE: All tests run in this JVM: [TestDistributedMissingSort, TestRandomMergePolicy, TestSolrConfigHandlerCloud, TestSweetSpotSimilarityFactory, TestSolrJ, TestStressLucene, TestBinaryResponseWriter, TestIndexSearcher, HdfsLockFactoryTest] [junit4] Completed [164/497] on J5 in 28.44s, 2 tests, 1 error <<< FAILURES!
        Hide
        Steve Rowe added a comment -

        Also the seed repros for me, on OS X.

        Show
        Steve Rowe added a comment - Also the seed repros for me, on OS X.
        Hide
        Anshum Gupta added a comment -

        I can reproduce the same issue too. Hit this while creating the RC.

        Show
        Anshum Gupta added a comment - I can reproduce the same issue too. Hit this while creating the RC.
        Hide
        ASF subversion and git services added a comment -

        Commit 1682352 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1682352 ]

        LUCENE-6507: fix test bug to not double-obtain. testDoubleObtain already tests that

        Show
        ASF subversion and git services added a comment - Commit 1682352 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1682352 ] LUCENE-6507 : fix test bug to not double-obtain. testDoubleObtain already tests that
        Hide
        ASF subversion and git services added a comment -

        Commit 1682353 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682353 ]

        LUCENE-6507: fix test bug to not double-obtain. testDoubleObtain already tests that

        Show
        ASF subversion and git services added a comment - Commit 1682353 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682353 ] LUCENE-6507 : fix test bug to not double-obtain. testDoubleObtain already tests that
        Hide
        ASF subversion and git services added a comment -

        Commit 1682354 from Robert Muir in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1682354 ]

        LUCENE-6507: fix test bug to not double-obtain. testDoubleObtain already tests that

        Show
        ASF subversion and git services added a comment - Commit 1682354 from Robert Muir in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1682354 ] LUCENE-6507 : fix test bug to not double-obtain. testDoubleObtain already tests that
        Hide
        Mark Miller added a comment -

        I can reproduce the same issue too. Hit this while creating the RC.

        Just a change in API behavior. Previously a double obtain was returning false and now it's throwing an exception.

        Show
        Mark Miller added a comment - I can reproduce the same issue too. Hit this while creating the RC. Just a change in API behavior. Previously a double obtain was returning false and now it's throwing an exception.
        Hide
        Robert Muir added a comment -

        I wouldnt go that far: it was discussed here, see the comments above.

        Any code doing this is really broken/stupid (example: the test in question). There is not a use-case.

        Previously you already had to be prepared for obtain() to throw IOException anyway for other stupid cases (the test did not do this), so its not a problem that we detect this and give you a helpful exception that your code is broken.

        Show
        Robert Muir added a comment - I wouldnt go that far: it was discussed here, see the comments above. Any code doing this is really broken/stupid (example: the test in question). There is not a use-case. Previously you already had to be prepared for obtain() to throw IOException anyway for other stupid cases (the test did not do this), so its not a problem that we detect this and give you a helpful exception that your code is broken.
        Hide
        Mark Miller added a comment -

        Nope, I stand by that assessment

        Show
        Mark Miller added a comment - Nope, I stand by that assessment
        Hide
        Uwe Schindler added a comment -

        Double obtains were not really supported and the behaviour was.... mhm.... undefined. So I think it's better that Robert and Mike fixed it to be consistent.

        Unfortunately we just missed to fix this test, but thats already fixed!

        Show
        Uwe Schindler added a comment - Double obtains were not really supported and the behaviour was.... mhm.... undefined. So I think it's better that Robert and Mike fixed it to be consistent. Unfortunately we just missed to fix this test, but thats already fixed!
        Hide
        Robert Muir added a comment -

        Feel free to propose your use case, where there is valid code not handling the previous IOException, with some valid use case for calling obtain() on an already-obtained lock.

        Just test bugs.

        Show
        Robert Muir added a comment - Feel free to propose your use case, where there is valid code not handling the previous IOException, with some valid use case for calling obtain() on an already-obtained lock. Just test bugs.
        Hide
        Mark Miller added a comment -

        You guys are just howling into the air...please reread and or get a clue.

        Show
        Mark Miller added a comment - You guys are just howling into the air...please reread and or get a clue.
        Hide
        Anshum Gupta added a comment -

        I think all's sorted for now. Thanks everyone

        P.S: I've started the 5.2 RC2 build.

        Show
        Anshum Gupta added a comment - I think all's sorted for now. Thanks everyone P.S: I've started the 5.2 RC2 build.
        Hide
        Robert Muir added a comment -

        On the contrary, I already fixed the test (and mike had already added an explicit separate test for double-obtain for HDFS).

        Looks like the ones howling into the air are... the peanut gallery, not the do-ers.

        Show
        Robert Muir added a comment - On the contrary, I already fixed the test (and mike had already added an explicit separate test for double-obtain for HDFS). Looks like the ones howling into the air are... the peanut gallery, not the do-ers.
        Hide
        Mark Miller added a comment -

        I'll lend a hand and spell it out.

        Anshum asked if I'd look at this issue as it involves hdfs and the release.

        I looked at it. I found that:

        Just a change in API behavior. Previously a double obtain was returning false and now it's throwing an exception.

        This is true. I don't care how smart you think you are.

        By then, Robert had made a further commit.

        Beyond that, not much to see here. Chill out.

        Show
        Mark Miller added a comment - I'll lend a hand and spell it out. Anshum asked if I'd look at this issue as it involves hdfs and the release. I looked at it. I found that: Just a change in API behavior. Previously a double obtain was returning false and now it's throwing an exception. This is true. I don't care how smart you think you are. By then, Robert had made a further commit. Beyond that, not much to see here. Chill out.
        Hide
        Uwe Schindler added a comment -

        With LUCENE-6508, double obtains will be impossible in the future: you just get a Lock instance if you have actually locked the directory (using Directory#obtainLock(name)). This lock class is immutable and once released with close() its gone. Because the new Lock class has no "obtain" anymore, double obtains are impossible.

        But as this new issue takes longer, we just did a "quick" fix to make 5.2 release-able.

        Show
        Uwe Schindler added a comment - With LUCENE-6508 , double obtains will be impossible in the future: you just get a Lock instance if you have actually locked the directory (using Directory#obtainLock(name)). This lock class is immutable and once released with close() its gone. Because the new Lock class has no "obtain" anymore, double obtains are impossible. But as this new issue takes longer, we just did a "quick" fix to make 5.2 release-able.
        Hide
        Mark Miller added a comment -

        Looks like the ones howling into the air are... the peanut gallery, not the do-ers.

        Nah, you are just being a dick for no reason as usual. Normal day in Lucene land.

        Show
        Mark Miller added a comment - Looks like the ones howling into the air are... the peanut gallery, not the do-ers. Nah, you are just being a dick for no reason as usual. Normal day in Lucene land.
        Hide
        Michael McCandless added a comment -

        Argh, thank you for fixing the HDFSLockFactory failure.

        Show
        Michael McCandless added a comment - Argh, thank you for fixing the HDFSLockFactory failure.
        Hide
        Michael McCandless added a comment -

        So ... here's a 4.10.x backport patch, but it was kinda messy: lots of
        conflicts because we've basically already rewritten locking once in 5.x.

        I stuck with java.io APIs (File) instead of converting to NIO.2 apis
        (Path). I also back-ported AssertingLock to MockDirectoryWrapper.

        This patch breaks NativeFSLockFactory.clearLock: its impl "relied" on
        this "when I close I nuke any other locks" behavior, and I had to
        remove one test case that in facets module that was doing this. The
        API is deprecated (gone in 5.x) but still feels wrong to break it on
        such an old bugfix branch...

        Net/net this is a biggish change, and I don't think we should backport
        this to 4.10.x: this branch is very old now, and this change is a too risky.

        Show
        Michael McCandless added a comment - So ... here's a 4.10.x backport patch, but it was kinda messy: lots of conflicts because we've basically already rewritten locking once in 5.x. I stuck with java.io APIs (File) instead of converting to NIO.2 apis (Path). I also back-ported AssertingLock to MockDirectoryWrapper. This patch breaks NativeFSLockFactory.clearLock: its impl "relied" on this "when I close I nuke any other locks" behavior, and I had to remove one test case that in facets module that was doing this. The API is deprecated (gone in 5.x) but still feels wrong to break it on such an old bugfix branch... Net/net this is a biggish change, and I don't think we should backport this to 4.10.x: this branch is very old now, and this change is a too risky.
        Hide
        Robert Muir added a comment -

        Thanks for investigating mike. I looked at the backport patch and saw some corner case bugs still, like calling getCanonicalPath before ensuring the file is created already (not truly canonical). I agree, this seems too risky. Lets just proceed with 5.x and leave it broken there.

        Show
        Robert Muir added a comment - Thanks for investigating mike. I looked at the backport patch and saw some corner case bugs still, like calling getCanonicalPath before ensuring the file is created already (not truly canonical). I agree, this seems too risky. Lets just proceed with 5.x and leave it broken there.
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development