Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      NativeFSLock create a test lock file which its name might collide w/ another JVM that is running. Very unlikely, but still it happened a couple of times already, since the tests were parallelized. This may result in a false exception thrown from release(), when the lock file's delete() is called and returns false, because the file does not exist (deleted by another JVM already). In addition, release() should give a second attempt to delete() if it fails, since the file may be held temporarily by another process (like AntiVirus) before it fails. The proposed changes are:

      1) Use ManagementFactory.getRuntimeMXBean().getName() as part of the test lock name (should include the process Id)
      2) In release(), if delete() fails, check if the file indeed exists. If it is, let's attempt a re-delete() few ms later.
      3) If (3) still fails, throw an exception. Alternatively, we can attempt a deleteOnExit.

      I'll post a patch later today.

      1. LUCENE-2421-2.patch
        1 kB
        Shai Erera
      2. LUCENE-2421.patch
        4 kB
        Shai Erera
      3. LUCENE-2421.patch
        5 kB
        Shai Erera
      4. LUCENE-2421.patch
        3 kB
        Shai Erera
      5. LUCENE-2421.patch
        3 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch w/ the proposed changes and a fallback to deleteOnExit in acquireTestLock, if deleting the test lock fails still.

        BTW, I've used 100ms for the re-delete attempt delay. Do you think that's enough, or should I use a lower value, or perhaps IndexWriterConfig.getDefaultWriteLockTimeout (which is 1s unless changed)?

        Show
        Shai Erera added a comment - Patch w/ the proposed changes and a fallback to deleteOnExit in acquireTestLock, if deleting the test lock fails still. BTW, I've used 100ms for the re-delete attempt delay. Do you think that's enough, or should I use a lower value, or perhaps IndexWriterConfig.getDefaultWriteLockTimeout (which is 1s unless changed)?
        Hide
        Shai Erera added a comment -

        I'm having second thoughts about using MXBean, since the name returned includes invalid filename characters such as @. Will need to check that next time I'm infront of the code, but perhaps w/ all the other changes that might not be necessary ...

        Show
        Shai Erera added a comment - I'm having second thoughts about using MXBean, since the name returned includes invalid filename characters such as @. Will need to check that next time I'm infront of the code, but perhaps w/ all the other changes that might not be necessary ...
        Hide
        Michael McCandless added a comment -

        I think 100 msec is way too long – I was thinking more like 10 msec

        Show
        Michael McCandless added a comment - I think 100 msec is way too long – I was thinking more like 10 msec
        Hide
        Shai Erera added a comment -

        Well ... we are talking about a fallback, last chance code, that should hardly ever be executed (i.e., I'm pretty sure the delete() failed b/c the file wasn't there anymore). So if we've reached that situation, I think we should give a reasonable "last chance"? So that if the file can't be deleted since it's locked by AntiVirus, we should be on the safe side?

        But I don't have any strong feelings for it - if you think 10 ms are enough, I'll change it.

        Show
        Shai Erera added a comment - Well ... we are talking about a fallback, last chance code, that should hardly ever be executed (i.e., I'm pretty sure the delete() failed b/c the file wasn't there anymore). So if we've reached that situation, I think we should give a reasonable "last chance"? So that if the file can't be deleted since it's locked by AntiVirus, we should be on the safe side? But I don't have any strong feelings for it - if you think 10 ms are enough, I'll change it.
        Hide
        Michael McCandless added a comment -

        I guess since it's a last resort "we think will very rarely happen in practice" thing... we can leave it @ 100 msec. Just makes me nervous Libraries shouldn't be doing things like sleeping on you (for so long)...

        Hmm – after that sleep, we still throw LockReleaseFailedException if we are still unable to delete it? Isn't it "harmless" if we can't delete it?

        Also, rather than swallowing the InterruptedException, you should immediately throw ThreadInterruptedException – we fixed Lucene a while back to not swallow this exception (LUCENE-2053).

        Show
        Michael McCandless added a comment - I guess since it's a last resort "we think will very rarely happen in practice" thing... we can leave it @ 100 msec. Just makes me nervous Libraries shouldn't be doing things like sleeping on you (for so long)... Hmm – after that sleep, we still throw LockReleaseFailedException if we are still unable to delete it? Isn't it "harmless" if we can't delete it? Also, rather than swallowing the InterruptedException, you should immediately throw ThreadInterruptedException – we fixed Lucene a while back to not swallow this exception ( LUCENE-2053 ).
        Hide
        Shai Erera added a comment -

        Libraries shouldn't be doing things like sleeping on you (for so long)...

        Well ... I think they'd appreciate the sleep instead of the immediate exception. Previuosly, I've been asked by customers to do that in other places. A 100ms sleep is barely noticeable (being so rare), but an exception definitely is!

        Isn't it "harmless" if we can't delete it?

        No it isn't. I distinguish between two cases: the regular and test lock. For the regular lock, we've already agreed we cannot leave it behind as it will prevent future locking. For the test lock I catch the exception in acquireTestLock and call deleteOnExit() suppressing the exception.

        you should immediately throw ThreadInterruptedException

        ok, thought it'd be harmless here to ignore it since I'm only doing last resort actions. But if that's common practice now, I will throw the exception.

        Thanks for the comments !

        Show
        Shai Erera added a comment - Libraries shouldn't be doing things like sleeping on you (for so long)... Well ... I think they'd appreciate the sleep instead of the immediate exception. Previuosly, I've been asked by customers to do that in other places. A 100ms sleep is barely noticeable (being so rare), but an exception definitely is! Isn't it "harmless" if we can't delete it? No it isn't. I distinguish between two cases: the regular and test lock. For the regular lock, we've already agreed we cannot leave it behind as it will prevent future locking. For the test lock I catch the exception in acquireTestLock and call deleteOnExit() suppressing the exception. you should immediately throw ThreadInterruptedException ok, thought it'd be harmless here to ignore it since I'm only doing last resort actions. But if that's common practice now, I will throw the exception. Thanks for the comments !
        Hide
        Michael McCandless added a comment -

        Well ... I think they'd appreciate the sleep instead of the immediate exception. Previuosly, I've been asked by customers to do that in other places. A 100ms sleep is barely noticeable (being so rare), but an exception definitely is

        Yeah I agree we should try to be robust to the "diverse" installations out there, like AVs that tie up our files.

        For the regular lock, we've already agreed we cannot leave it behind as it will prevent future locking

        Wait, how come? (SimpleFSLF does this, but native gets a "real" lock).

        For the test lock I catch the exception in acquireTestLock and call deleteOnExit() suppressing the exception.

        OK.

        Show
        Michael McCandless added a comment - Well ... I think they'd appreciate the sleep instead of the immediate exception. Previuosly, I've been asked by customers to do that in other places. A 100ms sleep is barely noticeable (being so rare), but an exception definitely is Yeah I agree we should try to be robust to the "diverse" installations out there, like AVs that tie up our files. For the regular lock, we've already agreed we cannot leave it behind as it will prevent future locking Wait, how come? (SimpleFSLF does this, but native gets a "real" lock). For the test lock I catch the exception in acquireTestLock and call deleteOnExit() suppressing the exception. OK.
        Hide
        Mark Miller added a comment -

        Wait, how come? (SimpleFSLF does this, but native gets a "real" lock).

        Yeah, this doesn't make sense to me either - native file locks don't care if the file is preexisting or not. It shouldnt matter if the file is already there or needs to be created. Its just a dummy file to get the lock on - even you start Lucene and the file exists, there will be no lock on it, so it won't matter.

        Show
        Mark Miller added a comment - Wait, how come? (SimpleFSLF does this, but native gets a "real" lock). Yeah, this doesn't make sense to me either - native file locks don't care if the file is preexisting or not. It shouldnt matter if the file is already there or needs to be created. Its just a dummy file to get the lock on - even you start Lucene and the file exists, there will be no lock on it, so it won't matter.
        Hide
        Shai Erera added a comment -

        I guess that you've missed that on the thread: "It is possible that two JVMs will attempt to lock the same Directory, one w/ Native and the other w/ Simple. If we won't check in obtain() whether the file exists, it might obtain a native lock, while the Directory is actually locked by another JVM using Simple". Uwe also mentioned Native was fixed to use the same lock file name in 2.9 because of that.

        Another thing why we cannot leave the lock file behind is because if you e.g. switch from Native to Simple you won't be able to obtain a lock.

        And personally I prefer that if the Directory is not locked then the file won't be there - even if just for clarity, or because how we've all become used to treat the existence of the lock file by now. And I'd also hate to add another line to bw section

        Show
        Shai Erera added a comment - I guess that you've missed that on the thread: " It is possible that two JVMs will attempt to lock the same Directory, one w/ Native and the other w/ Simple. If we won't check in obtain() whether the file exists, it might obtain a native lock, while the Directory is actually locked by another JVM using Simple ". Uwe also mentioned Native was fixed to use the same lock file name in 2.9 because of that. Another thing why we cannot leave the lock file behind is because if you e.g. switch from Native to Simple you won't be able to obtain a lock. And personally I prefer that if the Directory is not locked then the file won't be there - even if just for clarity, or because how we've all become used to treat the existence of the lock file by now. And I'd also hate to add another line to bw section
        Hide
        Shai Erera added a comment -

        Patch adds throwin ThreadInterruptedException.

        About using MXBean name, I wrote this code:

        String name = "d:/temp/file-" + ManagementFactory.getRuntimeMXBean().getName();
        new File(name).createNewFile();
        System.out.println(new File(name).exists());
        

        The file created included the @ character, which is valid on Windows (and also verified on Linux). But I'm still not sure about using it, because if a system returns a character that is not a valid file character, we may not get the same file name we think ... so maybe if we still want to use it, we should move it to the end of the string, so that the name includes the random number for sure ... what do you think?

        Show
        Shai Erera added a comment - Patch adds throwin ThreadInterruptedException. About using MXBean name, I wrote this code: String name = "d:/temp/file-" + ManagementFactory.getRuntimeMXBean().getName(); new File(name).createNewFile(); System .out.println( new File(name).exists()); The file created included the @ character, which is valid on Windows (and also verified on Linux). But I'm still not sure about using it, because if a system returns a character that is not a valid file character, we may not get the same file name we think ... so maybe if we still want to use it, we should move it to the end of the string, so that the name includes the random number for sure ... what do you think?
        Hide
        Mark Miller added a comment -

        I guess that you've missed that on the thread

        No, just havn't seen a good reason yet...

        "It is possible that two JVMs will attempt to lock the same Directory, one w/ Native and the other w/ Simple. If we won't check in obtain() whether the file exists, it might obtain a native lock, while the Directory is actually locked by another JVM using Simple". Uwe also mentioned Native was fixed to use the same lock file name in 2.9 because of that.

        Lock factories do not have to work with all other lock factories...you need to use the same lock factory across all process'.

        Another thing why we cannot leave the lock file behind is because if you e.g. switch from Native to Simple you won't be able to obtain a lock.

        Not true - there is no reason both simple and native need to use the same lock file - or even that the native lock feel needs to be in the same dir (eg it could be in a tmp dir)

        And personally I prefer that if the Directory is not locked then the file won't be there

        You cannot guarantee this in any case. If they cannot be deleted, it cannot be deleted - but cleanliness is no reason to throw an exception in an event that is not actually a failure situation.

        even if just for clarity, or because how we've all become used to treat the existence of the lock file by now.

        I don't find that persuasive myself...

        And I'd also hate to add another line to bw section

        but its not a back compat break ...

        Show
        Mark Miller added a comment - I guess that you've missed that on the thread No, just havn't seen a good reason yet... "It is possible that two JVMs will attempt to lock the same Directory, one w/ Native and the other w/ Simple. If we won't check in obtain() whether the file exists, it might obtain a native lock, while the Directory is actually locked by another JVM using Simple". Uwe also mentioned Native was fixed to use the same lock file name in 2.9 because of that. Lock factories do not have to work with all other lock factories...you need to use the same lock factory across all process'. Another thing why we cannot leave the lock file behind is because if you e.g. switch from Native to Simple you won't be able to obtain a lock. Not true - there is no reason both simple and native need to use the same lock file - or even that the native lock feel needs to be in the same dir (eg it could be in a tmp dir) And personally I prefer that if the Directory is not locked then the file won't be there You cannot guarantee this in any case. If they cannot be deleted, it cannot be deleted - but cleanliness is no reason to throw an exception in an event that is not actually a failure situation. even if just for clarity, or because how we've all become used to treat the existence of the lock file by now. I don't find that persuasive myself... And I'd also hate to add another line to bw section but its not a back compat break ...
        Hide
        Shai Erera added a comment -

        Lock factories do not have to work with all other lock factories...you need to use the same lock factory across all process'.

        I agree, but I've grown accustomed here that we try to protect users from their own mistakes. I.e., if the factory is a parameter to the application, this can certainly happen, w/o any of the application owners even realize that ... so it just seems dangerous to me, and I think that the alternative of keeping the file there and failing elsewhere is not good.

        Another scenario - one application opens the index on a local file system (which is shared) and uses Native, while the other, for safety reason, opens the index using Simple lock because it reads the index from a remote share ... does not make a lot of sense - but possible.

        or even that the native lock feel needs to be in the same dir (eg it could be in a tmp dir)

        Good point - but it's irrelevant to that issue - one can already shoot himself in the leg today by doing that. That's something I don't think we can protect from ... but allowing two lock files in the same directory - that just seems like a bug. Still, one could impl MyOwnNativeFSLock and create "my.lock.file" ... but we cannot protect from that either. So at least, and that's my own opinion, the core factories that are provided w/ Lucene should behave well.

        So Mark - I agreed w/ you before on that, until Uwe brought up the reason why NativeFSLock was fixed in 2.9 to use the same lock file as Simple ... it seems like a good catch to me, and I think if it was fixed once already, do we really want to break that fix? That will certainly be a bw break, because now NativeFSLock could obtain a lock if a lock file is there (I'm not sure if we have tests that cover that scenario, but I have a feeling some will fail if we change that).

        If the result of that discussion is that we should not fail if the lock file cannot be deleted, then I think we should rename it too, so that Native and Simple use different files and it is clear that you're using two different LFs, which is not supported?

        Show
        Shai Erera added a comment - Lock factories do not have to work with all other lock factories...you need to use the same lock factory across all process'. I agree, but I've grown accustomed here that we try to protect users from their own mistakes. I.e., if the factory is a parameter to the application, this can certainly happen, w/o any of the application owners even realize that ... so it just seems dangerous to me, and I think that the alternative of keeping the file there and failing elsewhere is not good. Another scenario - one application opens the index on a local file system (which is shared) and uses Native, while the other, for safety reason, opens the index using Simple lock because it reads the index from a remote share ... does not make a lot of sense - but possible. or even that the native lock feel needs to be in the same dir (eg it could be in a tmp dir) Good point - but it's irrelevant to that issue - one can already shoot himself in the leg today by doing that. That's something I don't think we can protect from ... but allowing two lock files in the same directory - that just seems like a bug. Still, one could impl MyOwnNativeFSLock and create "my.lock.file" ... but we cannot protect from that either. So at least, and that's my own opinion, the core factories that are provided w/ Lucene should behave well. So Mark - I agreed w/ you before on that, until Uwe brought up the reason why NativeFSLock was fixed in 2.9 to use the same lock file as Simple ... it seems like a good catch to me, and I think if it was fixed once already, do we really want to break that fix? That will certainly be a bw break, because now NativeFSLock could obtain a lock if a lock file is there (I'm not sure if we have tests that cover that scenario, but I have a feeling some will fail if we change that). If the result of that discussion is that we should not fail if the lock file cannot be deleted, then I think we should rename it too, so that Native and Simple use different files and it is clear that you're using two different LFs, which is not supported?
        Hide
        Uwe Schindler added a comment - - edited

        The file created included the @ character, which is valid on Windows (and also verified on Linux). But I'm still not sure about using it, because if a system returns a character that is not a valid file character, we may not get the same file name we think ... so maybe if we still want to use it, we should move it to the end of the string, so that the name includes the random number for sure ... what do you think?

        I suggested on IRC to take the output of getName() and do a regexp replace and replace all non ACSII-digit chars:

        ManagementFactory.getRuntimeMXBean().getName().replaceAll("[^a..zA..Z0..9]+","")
        

        Or alternatively simply use the hashCode of that String (but that is less unique).

        Show
        Uwe Schindler added a comment - - edited The file created included the @ character, which is valid on Windows (and also verified on Linux). But I'm still not sure about using it, because if a system returns a character that is not a valid file character, we may not get the same file name we think ... so maybe if we still want to use it, we should move it to the end of the string, so that the name includes the random number for sure ... what do you think? I suggested on IRC to take the output of getName() and do a regexp replace and replace all non ACSII-digit chars: ManagementFactory.getRuntimeMXBean().getName().replaceAll( "[^a..zA..Z0..9]+" ,"") Or alternatively simply use the hashCode of that String (but that is less unique).
        Hide
        Mark Miller added a comment -

        I'm still confused on the native/simple working together thing - I don't see where native will respect simple's lock? Am I missing something?

        Also, its almost more confusing that simple will respect the native lock file, when that file is not even native's actual lock - just its medium for a lock. If they could really work together, why even have a native lock?

        Show
        Mark Miller added a comment - I'm still confused on the native/simple working together thing - I don't see where native will respect simple's lock? Am I missing something? Also, its almost more confusing that simple will respect the native lock file, when that file is not even native's actual lock - just its medium for a lock. If they could really work together, why even have a native lock?
        Hide
        Shai Erera added a comment -

        You're right Mark ! I did not do a thorough check on NativeFSLock.obtain(). In the beginning of the method, it calls lockExists() and I assumed it checks for the existence of the file. But it actually checks whether the FileLock is not null. So that means:

        • If one obtains a Native lock and later a Simple lock obtain is attempted - it will fail.
        • If one obtains a Simple lock and later a Native lock obtain is attempted - it will succeed.

        I wrote the following code to demonstrate that:

        SimpleFSLockFactory simple = new SimpleFSLockFactory(dir);
        NativeFSLockFactory nativel = new NativeFSLockFactory(dir);
        System.out.println(nativel.makeLock("test").obtain());
        System.out.println(simple.makeLock("test").obtain());
        

        This prints "true" and "false" while if you move the simple.makeLock line above the native, it prints "true" twice.

        I don't know if that's a problem or not because it all boils down to whether we want to be nice if the user has made a mistake and used two lock factories in two different places of the code.

        Given that, if we are ok to declare this is unsupported in the sense that the code won't play nice, then I'm ok w/ not failing if the lock file deletion fails, for both the regular and test lock. I think it makes sense to decide the code doesn't play nice, because someone can anyway extend LF and do such silly mistakes ...

        Show
        Shai Erera added a comment - You're right Mark ! I did not do a thorough check on NativeFSLock.obtain(). In the beginning of the method, it calls lockExists() and I assumed it checks for the existence of the file. But it actually checks whether the FileLock is not null. So that means: If one obtains a Native lock and later a Simple lock obtain is attempted - it will fail. If one obtains a Simple lock and later a Native lock obtain is attempted - it will succeed. I wrote the following code to demonstrate that: SimpleFSLockFactory simple = new SimpleFSLockFactory(dir); NativeFSLockFactory nativel = new NativeFSLockFactory(dir); System .out.println(nativel.makeLock( "test" ).obtain()); System .out.println(simple.makeLock( "test" ).obtain()); This prints "true" and "false" while if you move the simple.makeLock line above the native, it prints "true" twice. I don't know if that's a problem or not because it all boils down to whether we want to be nice if the user has made a mistake and used two lock factories in two different places of the code. Given that, if we are ok to declare this is unsupported in the sense that the code won't play nice, then I'm ok w/ not failing if the lock file deletion fails, for both the regular and test lock. I think it makes sense to decide the code doesn't play nice, because someone can anyway extend LF and do such silly mistakes ...
        Hide
        Shai Erera added a comment -

        Patch includes:

        • Not throwing exception if the delete() fails
        • Added test case to TestLockFactory
        • Record in Runtime Behavior in CHANGES
        Show
        Shai Erera added a comment - Patch includes: Not throwing exception if the delete() fails Added test case to TestLockFactory Record in Runtime Behavior in CHANGES
        Hide
        Michael McCandless added a comment -

        Hmm so if failure to delete is no longer a serious issue (ie we are not going to throw an exception), do we really need to sleep for a long time & retry the deletion any more?

        This is all based on a hypothetical situation right ("we may fail to delete the test file we just created, and it still exists")? Maybe we should wait until this is really an issue (ie, a user actually hits it), before implementing this workaround code?

        Show
        Michael McCandless added a comment - Hmm so if failure to delete is no longer a serious issue (ie we are not going to throw an exception), do we really need to sleep for a long time & retry the deletion any more? This is all based on a hypothetical situation right ("we may fail to delete the test file we just created, and it still exists")? Maybe we should wait until this is really an issue (ie, a user actually hits it), before implementing this workaround code?
        Hide
        Shai Erera added a comment -

        That's just so that we "behave nicely" with the application and not leave behind the lock file. I'm thinking that if you see a write.lock file in the directory, it makes you wonder why it's there, and if the index was closed properly or not etc. The existence of the file does not tell you anything about which LF is used (Native or Simple) and so you cannot know for sure what went wrong. Going the extra mile, even if it's only in rare cases, seems harmless, no?

        I don't think it's critical, but I also don't think the code complicates matters?

        Show
        Shai Erera added a comment - That's just so that we "behave nicely" with the application and not leave behind the lock file. I'm thinking that if you see a write.lock file in the directory, it makes you wonder why it's there, and if the index was closed properly or not etc. The existence of the file does not tell you anything about which LF is used (Native or Simple) and so you cannot know for sure what went wrong. Going the extra mile, even if it's only in rare cases, seems harmless, no? I don't think it's critical, but I also don't think the code complicates matters?
        Hide
        Michael McCandless added a comment -

        Well sleeping for so long isn't necessarily harmless

        And it's code to fix a problem that's only theoretical at this point – we've never seen this actually happen – so this violates 'design for today'. Why fix something that may not even be an actual problem...

        Also the lock file name has "-test" in it, which is a strong hint to the future hypothetical user that hits this.

        Show
        Michael McCandless added a comment - Well sleeping for so long isn't necessarily harmless And it's code to fix a problem that's only theoretical at this point – we've never seen this actually happen – so this violates 'design for today'. Why fix something that may not even be an actual problem... Also the lock file name has "-test" in it, which is a strong hint to the future hypothetical user that hits this.
        Hide
        Shai Erera added a comment -

        Ok, I'm convinced - let's design for today. Removed the "last chance" code.

        Show
        Shai Erera added a comment - Ok, I'm convinced - let's design for today. Removed the "last chance" code.
        Hide
        Uwe Schindler added a comment -

        Looks good!

        Show
        Uwe Schindler added a comment - Looks good!
        Hide
        Michael McCandless added a comment -

        Thanks Shai – this looks great!

        Show
        Michael McCandless added a comment - Thanks Shai – this looks great!
        Hide
        Shai Erera added a comment -

        Committed revision 940730.

        Show
        Shai Erera added a comment - Committed revision 940730.
        Hide
        Shai Erera added a comment -

        Back-port to 3.1 as well

        Show
        Shai Erera added a comment - Back-port to 3.1 as well
        Hide
        Shai Erera added a comment -

        Committed revision 941361 (backport to 3x).

        Show
        Shai Erera added a comment - Committed revision 941361 (backport to 3x).
        Hide
        Michael McCandless added a comment -

        backport

        Show
        Michael McCandless added a comment - backport
        Hide
        Shai Erera added a comment -

        Reopening to handle clearLock. Patch will be posted soon.

        Show
        Shai Erera added a comment - Reopening to handle clearLock. Patch will be posted soon.
        Hide
        Shai Erera added a comment -

        Patch fixes clearLock to first release the lock and then delete it. If the release fails, it means another process is holding the lock and therefore we shouldn't fail. If the delete fails, it ignores it.

        I plan to commit shortly.

        Show
        Shai Erera added a comment - Patch fixes clearLock to first release the lock and then delete it. If the release fails, it means another process is holding the lock and therefore we shouldn't fail. If the delete fails, it ignores it. I plan to commit shortly.
        Hide
        Shai Erera added a comment -

        Fixed on 3x and trunk

        Show
        Shai Erera added a comment - Fixed on 3x and trunk
        Hide
        Michael McCandless added a comment -

        So, I'm confused on how/why the test lock name would get a conflict
        with multiple JREs running our unit tests.

        Ie, each JRE runs tests in certain index directories. And the test
        lock is acquired in the index directory. So, if tests running in
        different JREs are sharing the same index directory, then more serious
        problems will ensue.

        I think it must have only been the test output formatter
        (LuceneJUnitResultFormatter) that was conflicting on the test lock
        name. Because this is the only locking usage that'd be sharing the
        same lock dir across JREs...

        Show
        Michael McCandless added a comment - So, I'm confused on how/why the test lock name would get a conflict with multiple JREs running our unit tests. Ie, each JRE runs tests in certain index directories. And the test lock is acquired in the index directory. So, if tests running in different JREs are sharing the same index directory, then more serious problems will ensue. I think it must have only been the test output formatter (LuceneJUnitResultFormatter) that was conflicting on the test lock name. Because this is the only locking usage that'd be sharing the same lock dir across JREs...
        Hide
        Shai Erera added a comment -

        From what I remember, it was due to LuceneJUnitResultFormatter, yes.

        I don't mind if we stop using j.l.management and move to either:

        • Seed the Random w/ System.nanoTime() or
        • Don't acquire a test lock.

        I don't remember if I ever tried the first approach. Your comment on LUCENE-2688 about not acquiring the test lock makes sense to me, but some testing would be required to confirm that I guess.

        Show
        Shai Erera added a comment - From what I remember, it was due to LuceneJUnitResultFormatter, yes. I don't mind if we stop using j.l.management and move to either: Seed the Random w/ System.nanoTime() or Don't acquire a test lock. I don't remember if I ever tried the first approach. Your comment on LUCENE-2688 about not acquiring the test lock makes sense to me, but some testing would be required to confirm that I guess.
        Hide
        Michael McCandless added a comment -

        I'll open a new issue, to remove acquireTestLock... all tests pass with it removed.

        Show
        Michael McCandless added a comment - I'll open a new issue, to remove acquireTestLock... all tests pass with it removed.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development