Lucene - Core
  1. Lucene - Core
  2. LUCENE-5624

nightly 'test-lock-factory' may leak file handles

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, 4.9, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/556/console

      [LockStressTest1] Exception in thread "main" java.nio.file.FileSystemException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-NightlyTests-trunk/lucene/build/core/lockfactorytest/test-test.lock: Too many open files in system
      [LockStressTest1] 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
      [LockStressTest1] 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      [LockStressTest1] 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      [LockStressTest1] 	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176)
      [LockStressTest1] 	at java.nio.channels.FileChannel.open(FileChannel.java:287)
      [LockStressTest1] 	at java.nio.channels.FileChannel.open(FileChannel.java:334)
      [LockStressTest1] 	at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:149)
      [LockStressTest1] 	at org.apache.lucene.store.VerifyingLockFactory$CheckedLock.obtain(VerifyingLockFactory.java:65)
      [LockStressTest1] 	at org.apache.lucene.store.Lock.obtain(Lock.java:77)
      [LockStressTest1] 	at org.apache.lucene.store.LockStressTest.main(LockStressTest.java:114)
      
      1. LUCENE-5624.patch
        4 kB
        Robert Muir
      2. LUCENE-5624.patch
        3 kB
        Robert Muir
      3. LUCENE-5624-test.patch
        6 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        I will see if i can reproduce this by running it with a massive amount of iterations.

        Show
        Robert Muir added a comment - I will see if i can reproduce this by running it with a massive amount of iterations.
        Hide
        Robert Muir added a comment -

        This is probably important to look at before 4.8: just in case somehow its the locking code itself causing a problem!

        Show
        Robert Muir added a comment - This is probably important to look at before 4.8: just in case somehow its the locking code itself causing a problem!
        Hide
        Robert Muir added a comment -

        Lets also test Simple here too. I want to know if it has leaks.

        Show
        Robert Muir added a comment - Lets also test Simple here too. I want to know if it has leaks.
        Hide
        Robert Muir added a comment -

        Reproduces easily for me: ant test-lock-factory -Dtests.nightly=true -Dtests.multiplier=100

        Show
        Robert Muir added a comment - Reproduces easily for me: ant test-lock-factory -Dtests.nightly=true -Dtests.multiplier=100
        Hide
        Robert Muir added a comment -

        The leak definitely seems to be locks (not sockets or something), from the output of 'lsof' i see this:

        java      15416 15430      rmuir 1657w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1658w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1659w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1660w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1661w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1662w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1663w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1664w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1665w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1666w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1667w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1668w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1669w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        java      15416 15430      rmuir 1670w      REG                8,1         0    5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock
        ...
        
        Show
        Robert Muir added a comment - The leak definitely seems to be locks (not sockets or something), from the output of 'lsof' i see this: java 15416 15430 rmuir 1657w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1658w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1659w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1660w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1661w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1662w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1663w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1664w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1665w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1666w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1667w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1668w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1669w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock java 15416 15430 rmuir 1670w REG 8,1 0 5617836 /home/rmuir/workspace/lucene-trunk/lucene/build/core/lockfactorytest/test-test.lock ...
        Hide
        Robert Muir added a comment -

        If i run the test with -Dlock.factory.impl=org.apache.lucene.store.SimpleFSLockFactory then it passes always.

        Looks like a real leak in NativeFSLock!

        Show
        Robert Muir added a comment - If i run the test with -Dlock.factory.impl=org.apache.lucene.store.SimpleFSLockFactory then it passes always. Looks like a real leak in NativeFSLock!
        Hide
        Robert Muir added a comment -

        I dont see the leak yet, but in all cases the 'else' case in close() is bullshit.

        Lock is java.io.Closeable now.

        Show
        Robert Muir added a comment - I dont see the leak yet, but in all cases the 'else' case in close() is bullshit. Lock is java.io.Closeable now.
        Hide
        Robert Muir added a comment -

        Here is the bug:

        Index: src/java/org/apache/lucene/store/NativeFSLockFactory.java
        ===================================================================
        --- src/java/org/apache/lucene/store/NativeFSLockFactory.java	(revision 1589085)
        +++ src/java/org/apache/lucene/store/NativeFSLockFactory.java	(working copy)
        @@ -150,7 +150,7 @@
             boolean success = false;
             try {
               lock = channel.tryLock();
        -      success = true;
        +      success = lock != null;
             } catch (IOException | OverlappingFileLockException e) {
               // At least on OS X, we will sometimes get an
               // intermittent "Permission Denied" IOException,
        

        We leak a handle (The channel) when tryLock() returns null, but there is no exception.

        I can fix it this way, or clean up this code a little bit too. Its too confusing.

        Show
        Robert Muir added a comment - Here is the bug: Index: src/java/org/apache/lucene/store/NativeFSLockFactory.java =================================================================== --- src/java/org/apache/lucene/store/NativeFSLockFactory.java (revision 1589085) +++ src/java/org/apache/lucene/store/NativeFSLockFactory.java (working copy) @@ -150,7 +150,7 @@ boolean success = false ; try { lock = channel.tryLock(); - success = true ; + success = lock != null ; } catch (IOException | OverlappingFileLockException e) { // At least on OS X, we will sometimes get an // intermittent "Permission Denied" IOException, We leak a handle (The channel) when tryLock() returns null, but there is no exception. I can fix it this way, or clean up this code a little bit too. Its too confusing.
        Hide
        Uwe Schindler added a comment -

        I think we should further clean this up. The code on close() is horrible and makes no sense anymore. The whole NativeFSLock should be as short like SimpleFSLock. No need forr all the extra code!

        Show
        Uwe Schindler added a comment - I think we should further clean this up. The code on close() is horrible and makes no sense anymore. The whole NativeFSLock should be as short like SimpleFSLock. No need forr all the extra code!
        Hide
        Robert Muir added a comment -

        Here is my proposed patch. But we can do the one-liner too.

        I just hate the closeable violations, the hard-to-read code, etc.

        We should also fix test-lock-factory to test simple as well.

        Show
        Robert Muir added a comment - Here is my proposed patch. But we can do the one-liner too. I just hate the closeable violations, the hard-to-read code, etc. We should also fix test-lock-factory to test simple as well.
        Hide
        Robert Muir added a comment -

        Updated patch, removing the crazy test (SimpleFS does not do this stuff on close, and i dont think we should either, it breaks closeable)

        Show
        Robert Muir added a comment - Updated patch, removing the crazy test (SimpleFS does not do this stuff on close, and i dont think we should either, it breaks closeable)
        Hide
        Uwe Schindler added a comment -

        Here improved test

        Show
        Uwe Schindler added a comment - Here improved test
        Hide
        Uwe Schindler added a comment -

        +1 to commit with the new testing. Also please backport. I will redo my started RC (not yet visible...)

        Show
        Uwe Schindler added a comment - +1 to commit with the new testing. Also please backport. I will redo my started RC (not yet visible...)
        Hide
        ASF subversion and git services added a comment -

        Commit 1589131 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1589131 ]

        LUCENE-5624: fix NativeFS file handle leak, improve lock testing

        Show
        ASF subversion and git services added a comment - Commit 1589131 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1589131 ] LUCENE-5624 : fix NativeFS file handle leak, improve lock testing
        Hide
        ASF subversion and git services added a comment -

        Commit 1589134 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1589134 ]

        LUCENE-5624: fix NativeFS file handle leak, improve lock testing

        Show
        ASF subversion and git services added a comment - Commit 1589134 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1589134 ] LUCENE-5624 : fix NativeFS file handle leak, improve lock testing
        Hide
        ASF subversion and git services added a comment -

        Commit 1589140 from Robert Muir in branch 'dev/branches/lucene_solr_4_8'
        [ https://svn.apache.org/r1589140 ]

        LUCENE-5624: fix NativeFS file handle leak, improve lock testing

        Show
        ASF subversion and git services added a comment - Commit 1589140 from Robert Muir in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1589140 ] LUCENE-5624 : fix NativeFS file handle leak, improve lock testing
        Hide
        Robert Muir added a comment -

        Thanks Uwe!

        Show
        Robert Muir added a comment - Thanks Uwe!
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development