Lucene - Core
  1. Lucene - Core
  2. LUCENE-2108

SpellChecker file descriptor leak - no way to close the IndexSearcher used by SpellChecker internally

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      I can't find any way to close the IndexSearcher (and IndexReader) that
      is being used by SpellChecker internally.

      I've worked around this issue by keeping a single SpellChecker open
      for each index, but I'd really like to be able to close it and
      reopen it on demand without leaking file descriptors.

      Could we add a close() method to SpellChecker that will close the
      IndexSearcher and null the reference to it? And perhaps add some code
      that reopens the searcher if the reference to it is null? Or would
      that break thread safety of SpellChecker?

      The attached patch adds a close method but leaves it to the user to
      call setSpellIndex to reopen the searcher if desired.

      1. LUCENE-2108_Lucene_2_9_branch.patch
        28 kB
        Simon Willnauer
      2. LUCENE-2108_test_java14.patch
        3 kB
        Simon Willnauer
      3. LUCENE-2108.patch
        26 kB
        Simon Willnauer
      4. LUCENE-2108.patch
        23 kB
        Simon Willnauer
      5. LUCENE-2108.patch
        25 kB
        Simon Willnauer
      6. LUCENE-2108.patch
        4 kB
        Simon Willnauer
      7. LUCENE-2108-SpellChecker-close.patch
        0.6 kB
        Eirik Bjorsnos

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Committed, thanks!

          Show
          Uwe Schindler added a comment - Committed, thanks!
          Hide
          Simon Willnauer added a comment -

          Created sep. issue for that purpose LUCENE-2196

          Show
          Simon Willnauer added a comment - Created sep. issue for that purpose LUCENE-2196
          Hide
          Uwe Schindler added a comment -

          +1, because everything else that has close() in Lucene now implements this.

          Show
          Uwe Schindler added a comment - +1, because everything else that has close() in Lucene now implements this.
          Hide
          Achim Heiland added a comment -

          It would be nice if the SpellChecker implements the java.io.Closeable interface (Java 1.5)

          Show
          Achim Heiland added a comment - It would be nice if the SpellChecker implements the java.io.Closeable interface (Java 1.5)
          Hide
          Michael McCandless added a comment -

          OK committed on 2.9 as well – thanks Simon!

          Show
          Michael McCandless added a comment - OK committed on 2.9 as well – thanks Simon!
          Hide
          Simon Willnauer added a comment -

          Mike, I merged the branch with our current trunk and fixed all 1.5 code to compile and run with 1.4
          Hope that helps

          Show
          Simon Willnauer added a comment - Mike, I merged the branch with our current trunk and fixed all 1.5 code to compile and run with 1.4 Hope that helps
          Hide
          Michael McCandless added a comment -

          SImon, any chance you could work out the patch for SpellChecker.java on 2.9.x as well? I got a zillion conflicts on "svn merge" Thanks!

          Show
          Michael McCandless added a comment - SImon, any chance you could work out the patch for SpellChecker.java on 2.9.x as well? I got a zillion conflicts on "svn merge" Thanks!
          Hide
          Michael McCandless added a comment -

          Backported to 3.0... 2.9 next.

          Show
          Michael McCandless added a comment - Backported to 3.0... 2.9 next.
          Hide
          Simon Willnauer added a comment -

          Mike, I changed the testcase to be 1.4 compatible this might help you to merge the spellchecker into 2.9.1 since I can not commit into branches.
          It does not make sense to create a patch against the branch as you really want the mergeinfo and don't do it by patching things in branches.

          simon

          Show
          Simon Willnauer added a comment - Mike, I changed the testcase to be 1.4 compatible this might help you to merge the spellchecker into 2.9.1 since I can not commit into branches. It does not make sense to create a patch against the branch as you really want the mergeinfo and don't do it by patching things in branches. simon
          Hide
          Simon Willnauer added a comment -

          Thanks for fixing this!

          YW! very good feedback - I will port it to 2.9 soon.

          simon

          Show
          Simon Willnauer added a comment - Thanks for fixing this! YW! very good feedback - I will port it to 2.9 soon. simon
          Hide
          Shalin Shekhar Mangar added a comment -

          I ran into index corruption during stress testing in SOLR-785. After upgrading contrib-spellcheck to lucene trunk, those issues are no longer reproducible. You guys have saved me a lot of time

          Thanks for fixing this!

          Show
          Shalin Shekhar Mangar added a comment - I ran into index corruption during stress testing in SOLR-785 . After upgrading contrib-spellcheck to lucene trunk, those issues are no longer reproducible. You guys have saved me a lot of time Thanks for fixing this!
          Hide
          Simon Willnauer added a comment -

          Mike, I just realized that we need to change the test as it uses java5
          classes. I will provide you a patch compatible w. 1.4 later.

          On Dec 5, 2009 2:35 PM, "Michael McCandless (JIRA)" <jira@apache.org> wrote:

          [
          https://issues.apache.org/jira/browse/LUCENE-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786406#action_12786406]

          Michael McCandless commented on LUCENE-2108:
          --------------------------------------------

          We should backport this change to 2.9 - can you commit that please, I

          can not though.

          And, to 3.0. OK will do...

          by SpellChecker internally
          -----------------------------------------------------------------------------------------------------
          LUCENE-2108.patch, LUCENE-2108.patch, LUCENE-2108.patch, LUCENE-2108.patch


          This message is automatically generated by JIRA.
          -
          You can reply to this email to add a comment to the issue online.

          Show
          Simon Willnauer added a comment - Mike, I just realized that we need to change the test as it uses java5 classes. I will provide you a patch compatible w. 1.4 later. On Dec 5, 2009 2:35 PM, "Michael McCandless (JIRA)" <jira@apache.org> wrote: [ https://issues.apache.org/jira/browse/LUCENE-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786406#action_12786406 ] Michael McCandless commented on LUCENE-2108 : -------------------------------------------- We should backport this change to 2.9 - can you commit that please, I can not though. And, to 3.0. OK will do... by SpellChecker internally ----------------------------------------------------------------------------------------------------- LUCENE-2108 .patch, LUCENE-2108 .patch, LUCENE-2108 .patch, LUCENE-2108 .patch – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
          Hide
          Michael McCandless added a comment -

          We should backport this change to 2.9 - can you commit that please, I can not though.

          And, to 3.0. OK will do...

          Show
          Michael McCandless added a comment - We should backport this change to 2.9 - can you commit that please, I can not though. And, to 3.0. OK will do...
          Hide
          Simon Willnauer added a comment -

          committed in revision 887532

          Mike, thanks for review. We should backport this change to 2.9 - can you commit that please, I can not though.

          Show
          Simon Willnauer added a comment - committed in revision 887532 Mike, thanks for review. We should backport this change to 2.9 - can you commit that please, I can not though.
          Hide
          Simon Willnauer added a comment -

          I will commit this tomorrow if nobody objects.

          Show
          Simon Willnauer added a comment - I will commit this tomorrow if nobody objects.
          Hide
          Michael McCandless added a comment -

          Patch looks good Simon!

          Show
          Michael McCandless added a comment - Patch looks good Simon!
          Hide
          Simon Willnauer added a comment -

          Only small concern... you hold the lock while opening the new searcher....

          I fixed that one - very important for performance though!

          I found another issue, write access is not synchronized at all so it is possible to concurrently reindex or at least call indexDictionary() concurrently. In the first place this is not a huge issue as the writer will raise an exception but if the spellIndex is reset while the indexDicitonary is still in progress we could have inconsistencies with searcher and spellindex.
          I added another lock for write operations for now,

          Show
          Simon Willnauer added a comment - Only small concern... you hold the lock while opening the new searcher.... I fixed that one - very important for performance though! I found another issue, write access is not synchronized at all so it is possible to concurrently reindex or at least call indexDictionary() concurrently. In the first place this is not a huge issue as the writer will raise an exception but if the spellIndex is reset while the indexDicitonary is still in progress we could have inconsistencies with searcher and spellindex. I added another lock for write operations for now,
          Hide
          Michael McCandless added a comment -

          Looks good! Nice and simple.

          Only small concern... you hold the lock while opening the new searcher. It would be better to open the new searcher without the lock, then only acquire the lock to do the swap; this way any respell requests that come in don't block while the searcher is being opened (because obtainSearcher() needs to get the lock).

          Show
          Michael McCandless added a comment - Looks good! Nice and simple. Only small concern... you hold the lock while opening the new searcher. It would be better to open the new searcher without the lock, then only acquire the lock to do the swap; this way any respell requests that come in don't block while the searcher is being opened (because obtainSearcher() needs to get the lock).
          Hide
          Simon Willnauer added a comment -

          updated patch - this one does not use a Holder class or any java 5 classes for backcompat with 2.9

          Show
          Simon Willnauer added a comment - updated patch - this one does not use a Holder class or any java 5 classes for backcompat with 2.9
          Hide
          Simon Willnauer added a comment -

          If you back-port this to 2.9, you can't use any of the java.util.concurrent.*

          Very good point! - didn't thought about back porting at all.

          I'm not sure you need a separate SearcherHolder class - can't you re-use IndexReader's decRef/incRef?

          I guess I did not see the simplicity the reader offers - blind due to java.util.concurrent.*

          I think there are some thread safety issues..

          this is weird - on my dev machine and in the patch it is not synchronized.. on the machine I run the tests it is. Anyway you are right.

          I changed the code to be compatible with 2.9 using indexReaders.dec/incRef.. will attache in a minute

          Show
          Simon Willnauer added a comment - If you back-port this to 2.9, you can't use any of the java.util.concurrent.* Very good point! - didn't thought about back porting at all. I'm not sure you need a separate SearcherHolder class - can't you re-use IndexReader's decRef/incRef? I guess I did not see the simplicity the reader offers - blind due to java.util.concurrent.* I think there are some thread safety issues.. this is weird - on my dev machine and in the patch it is not synchronized.. on the machine I run the tests it is. Anyway you are right. I changed the code to be compatible with 2.9 using indexReaders.dec/incRef.. will attache in a minute
          Hide
          Michael McCandless added a comment -

          Some feedback on the patch:

          • If you back-port this to 2.9, you can't use any of the
            java.util.concurrent.*
          • I'm not sure you need a separate SearcherHolder class – can't you
            re-use IndexReader's decRef/incRef?
          • You don't need to do this.XXX in most places (maybe you're coming
            from eg Python? ).
          • Maybe rename "releaseNewSearcher" -> "swapSearcher"? (Because it
            releases the old one and installs the new one).
          • I think there are some thread safety issues – eg
            getSearcherHolder isn't sync'd, so, when you incRef
            this.searcherHolder at the start, but then return
            this.searcherHolder at the end, it could be two different
            instances.
          Show
          Michael McCandless added a comment - Some feedback on the patch: If you back-port this to 2.9, you can't use any of the java.util.concurrent.* I'm not sure you need a separate SearcherHolder class – can't you re-use IndexReader's decRef/incRef? You don't need to do this.XXX in most places (maybe you're coming from eg Python? ). Maybe rename "releaseNewSearcher" -> "swapSearcher"? (Because it releases the old one and installs the new one). I think there are some thread safety issues – eg getSearcherHolder isn't sync'd, so, when you incRef this.searcherHolder at the start, but then return this.searcherHolder at the end, it could be two different instances.
          Hide
          Simon Willnauer added a comment -

          This patch adds a close operation to SpellChecker and enables thread-safety.
          I added a testcase for concurrency as well as the close method - comments and review welcome!

          Show
          Simon Willnauer added a comment - This patch adds a close operation to SpellChecker and enables thread-safety. I added a testcase for concurrency as well as the close method - comments and review welcome!
          Hide
          Michael McCandless added a comment -

          I would keep that inside this issue and rather add close + make it threadsafe in one go.

          OK that sounds good then!

          Show
          Michael McCandless added a comment - I would keep that inside this issue and rather add close + make it threadsafe in one go. OK that sounds good then!
          Hide
          Simon Willnauer added a comment -

          Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.

          Mike, IMO thread-safety and close() are closely related. If you close the spellchecker you don't want other threads to access the spellchecker anymore. I would keep that inside this issue and rather add close + make it threadsafe in one go.
          Since spellchecker instances are shared between threads already we should rather try to make it thread-safe than documenting it. I see this as a bug though as you really want to share the spellchecker (essentially the searcher) instance instead of opening one for each thread.

          Show
          Simon Willnauer added a comment - Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine. Mike, IMO thread-safety and close() are closely related. If you close the spellchecker you don't want other threads to access the spellchecker anymore. I would keep that inside this issue and rather add close + make it threadsafe in one go. Since spellchecker instances are shared between threads already we should rather try to make it thread-safe than documenting it. I see this as a bug though as you really want to share the spellchecker (essentially the searcher) instance instead of opening one for each thread.
          Hide
          Michael McCandless added a comment -

          Just a reminder - we need to fix the CHANGES.TXT entry once this is done.

          Simon how about you do this, and take this issue (to commit your improvement to throw ACE not NPE)? Thanks

          Show
          Michael McCandless added a comment - Just a reminder - we need to fix the CHANGES.TXT entry once this is done. Simon how about you do this, and take this issue (to commit your improvement to throw ACE not NPE)? Thanks
          Hide
          Michael McCandless added a comment -

          Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.

          Show
          Michael McCandless added a comment - Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.
          Hide
          Eirik Bjorsnos added a comment -

          Mike,

          Please account for my demonstrated stupidity when considering this suggestion for thread safety policy / goals:

          1) Concurrent invocations of suggestSimilar() should not interfere with each other.
          2) An invocation of any of the write methods (setSpellIndex, clearIndex, indexDictionary) should not interfere with aleady invoked suggestSimilar
          3) All calls to write methods should be serialized (We could probably synchronize these methods?)

          If we synchronize any writes to the searcher reference, couldn't suggestSimilar just start its work by putting searcher in a local variable and use that instead of the field?

          I guess concurrency is hard to get right..

          Show
          Eirik Bjorsnos added a comment - Mike, Please account for my demonstrated stupidity when considering this suggestion for thread safety policy / goals: 1) Concurrent invocations of suggestSimilar() should not interfere with each other. 2) An invocation of any of the write methods (setSpellIndex, clearIndex, indexDictionary) should not interfere with aleady invoked suggestSimilar 3) All calls to write methods should be serialized (We could probably synchronize these methods?) If we synchronize any writes to the searcher reference, couldn't suggestSimilar just start its work by putting searcher in a local variable and use that instead of the field? I guess concurrency is hard to get right..
          Hide
          Simon Willnauer added a comment -

          Just a reminder - we need to fix the CHANGES.TXT entry once this is done.

          Show
          Simon Willnauer added a comment - Just a reminder - we need to fix the CHANGES.TXT entry once this is done.
          Hide
          Simon Willnauer added a comment -

          I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?

          this class is not threadsafe anyway. If you look at this snippet:

           // close the old searcher, if there was one
          if (searcher != null) {
            searcher.close();
          }
          searcher = new IndexSearcher(this.spellIndex, true);
          

          there could be a race if you concurrently reindex or set a new dictionary. IMO this should either be documented or made threadsafe. The close method should invalidate the spellchecker - it should not be possible to use a already closed Spellchecker.
          The searcher should be somehow ref counted so that if there is a searcher still in use you can concurrently reindex / add a new dictionary to ensure that the same searcher is used throughout suggestSimilar().

          I will take care of it once I get back tomorrow.

          Show
          Simon Willnauer added a comment - I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently? this class is not threadsafe anyway. If you look at this snippet: // close the old searcher, if there was one if (searcher != null ) { searcher.close(); } searcher = new IndexSearcher( this .spellIndex, true ); there could be a race if you concurrently reindex or set a new dictionary. IMO this should either be documented or made threadsafe. The close method should invalidate the spellchecker - it should not be possible to use a already closed Spellchecker. The searcher should be somehow ref counted so that if there is a searcher still in use you can concurrently reindex / add a new dictionary to ensure that the same searcher is used throughout suggestSimilar(). I will take care of it once I get back tomorrow.
          Hide
          Eirik Bjorsnos added a comment -

          Well not exactly. Simon's suggestion was just to throw an AlreadyClosedException instead of a NullPointerException which is probably ok and definitely easier.

          Show
          Eirik Bjorsnos added a comment - Well not exactly. Simon's suggestion was just to throw an AlreadyClosedException instead of a NullPointerException which is probably ok and definitely easier.
          Hide
          Michael McCandless added a comment -

          Reopening to get the AlreadyClosedException in there...

          Show
          Michael McCandless added a comment - Reopening to get the AlreadyClosedException in there...
          Hide
          Eirik Bjorsnos added a comment -

          Simon,

          Yes, that sound excactly like what I was thinking when I said "some code
          that reopens the searcher if the reference to it is null".

          I just didn't include it in my patch because I couldn't figure out how to do it properly.

          I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?

          Show
          Eirik Bjorsnos added a comment - Simon, Yes, that sound excactly like what I was thinking when I said "some code that reopens the searcher if the reference to it is null". I just didn't include it in my patch because I couldn't figure out how to do it properly. I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?
          Hide
          Simon Willnauer added a comment -

          Something like that would be more appropriate IMO

          Show
          Simon Willnauer added a comment - Something like that would be more appropriate IMO
          Hide
          Simon Willnauer added a comment -

          Mike / Eirik,

          If you set the searcher to null you might risk a NPE if suggestSimilar() or other methods are called afterwards. I would like to see something like ensureOpen() which throws an AlreadyClosedException or something similar. I will upload a suggestion in a second but need to run so tread it just as a suggestion.

          Simon

          Show
          Simon Willnauer added a comment - Mike / Eirik, If you set the searcher to null you might risk a NPE if suggestSimilar() or other methods are called afterwards. I would like to see something like ensureOpen() which throws an AlreadyClosedException or something similar. I will upload a suggestion in a second but need to run so tread it just as a suggestion. Simon
          Hide
          Michael McCandless added a comment -

          Thanks Eirik!

          Show
          Michael McCandless added a comment - Thanks Eirik!
          Hide
          Michael McCandless added a comment -

          Dude, you have be to a human to make mistakes as stupid as these!

          Good point

          Show
          Michael McCandless added a comment - Dude, you have be to a human to make mistakes as stupid as these! Good point
          Hide
          Eirik Bjorsnos added a comment -

          Dude, you have be to a human to make mistakes as stupid as these!

          (pubic void close, public void close, public void close...)

          Show
          Eirik Bjorsnos added a comment - Dude, you have be to a human to make mistakes as stupid as these! (pubic void close, public void close, public void close...)
          Hide
          Michael McCandless added a comment -

          Note that you said "private" again I'm starting to wonder if you are not human! Is this a turing test?

          OK, ok, I'll make it public, and port back to the 3.0 branch!

          Show
          Michael McCandless added a comment - Note that you said "private" again I'm starting to wonder if you are not human! Is this a turing test? OK, ok, I'll make it public, and port back to the 3.0 branch!
          Hide
          Eirik Bjorsnos added a comment -

          Haha, this is why I said the patch should be "pretty" trivial, instead of just "trivial"

          Yes, it should certainly be private. No idea how that happend. Must have been sleeping at the keyboad.

          Show
          Eirik Bjorsnos added a comment - Haha, this is why I said the patch should be "pretty" trivial, instead of just "trivial" Yes, it should certainly be private. No idea how that happend. Must have been sleeping at the keyboad.
          Hide
          Michael McCandless added a comment -

          Shouldn't the new close() method be public?

          Show
          Michael McCandless added a comment - Shouldn't the new close() method be public?
          Hide
          Eirik Bjorsnos added a comment -

          Patch that adds a close method to SpellChecker. The method calls close on the searcher used and then nulls the reference so that a new IndexSearcher will be created by the next call to setSpellIndex

          Show
          Eirik Bjorsnos added a comment - Patch that adds a close method to SpellChecker. The method calls close on the searcher used and then nulls the reference so that a new IndexSearcher will be created by the next call to setSpellIndex

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Eirik Bjorsnos
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development