Solr
  1. Solr
  2. SOLR-3392

searcher leak when openSearcher=false

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      It appears we have a searcher leak where SolrIndexSearcher instances are sometimes not being closed.

        Activity

        Hide
        Dawid Weiss added a comment -

        Are you talking about the "searchers opened: N, closed: N - Y" errors? This has been relatively frequent but I couldn't reproduce.

        Show
        Dawid Weiss added a comment - Are you talking about the "searchers opened: N, closed: N - Y" errors? This has been relatively frequent but I couldn't reproduce.
        Hide
        Yonik Seeley added a comment -

        While trying to reproduce the file descriptor leak that Sami and others have reported, I went to
        http://localhost:8983/solr/#/singlecore/plugins/core
        and noticed that I had 4 open SolrIndexSearcher instances - which means we have a searcher leak somehow.

        I'm still trying to track down exactly what's causing it, and why none of our tests catch it (and all of our tests check the number of opens/closes of SolrIndexSearcher instances)

        Show
        Yonik Seeley added a comment - While trying to reproduce the file descriptor leak that Sami and others have reported, I went to http://localhost:8983/solr/#/singlecore/plugins/core and noticed that I had 4 open SolrIndexSearcher instances - which means we have a searcher leak somehow. I'm still trying to track down exactly what's causing it, and why none of our tests catch it (and all of our tests check the number of opens/closes of SolrIndexSearcher instances)
        Hide
        Dawid Weiss added a comment -

        and why none of our tests catch it

        But they do? There were a number of failures when the number of closes/opens mismatched and failed the build. Maybe I don't understand what you mean.

        Show
        Dawid Weiss added a comment - and why none of our tests catch it But they do? There were a number of failures when the number of closes/opens mismatched and failed the build. Maybe I don't understand what you mean.
        Hide
        Dawid Weiss added a comment -

        This one is quite old, I just found it first –

        build	14-Apr-2012 06:17:47	    [junit] 149810 T1 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=70 closes=69
        

        I always assumed this is something to worry about but never could reproduce it with the same seed.

        Show
        Dawid Weiss added a comment - This one is quite old, I just found it first – build 14-Apr-2012 06:17:47 [junit] 149810 T1 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=70 closes=69 I always assumed this is something to worry about but never could reproduce it with the same seed.
        Hide
        Yonik Seeley added a comment -

        But they do? There were a number of failures when the number of closes/opens mismatched and failed the build.

        Those normally don't fail on my box - just our Jenkins box which is really slow + has weird stuff like blackhole configured. Talking to Mark, it sounds like the cause of those open/close searcher mismatches are because another thread is still running (like a recovery/replication thread) and still has a searcher open (hence it doesn't look like a searcher management bug).

        The type of bug I'm looking at would be a searcher management bug not related to thread safety and should be 100% reproducible.

        Show
        Yonik Seeley added a comment - But they do? There were a number of failures when the number of closes/opens mismatched and failed the build. Those normally don't fail on my box - just our Jenkins box which is really slow + has weird stuff like blackhole configured. Talking to Mark, it sounds like the cause of those open/close searcher mismatches are because another thread is still running (like a recovery/replication thread) and still has a searcher open (hence it doesn't look like a searcher management bug). The type of bug I'm looking at would be a searcher management bug not related to thread safety and should be 100% reproducible.
        Hide
        Dawid Weiss added a comment -

        Those normally don't fail on my box - just our Jenkins box which is really slow +

        Ok, thanks for clarification. Those I mentioned happened on our server where I was running Lucene tests for some time too (and that isn't slow). But maybe they were side effects of something else.

        Show
        Dawid Weiss added a comment - Those normally don't fail on my box - just our Jenkins box which is really slow + Ok, thanks for clarification. Those I mentioned happened on our server where I was running Lucene tests for some time too (and that isn't slow). But maybe they were side effects of something else.
        Hide
        Yonik Seeley added a comment -

        I just committed a fix - it was a really simple silly bug. When openSearcher=false, we requested a new searcher but then never decremented it when we were done. Although the getSearcher() code itself had good test coverage (that's the more complicated code I was worried about), the caller in this specific case (DirectUpdateHandler2) did not.

        Show
        Yonik Seeley added a comment - I just committed a fix - it was a really simple silly bug. When openSearcher=false, we requested a new searcher but then never decremented it when we were done. Although the getSearcher() code itself had good test coverage (that's the more complicated code I was worried about), the caller in this specific case (DirectUpdateHandler2) did not.
        Hide
        Dawid Weiss added a comment -

        https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13383/consoleText

        Just a note – a recent build with "SolrIndexSearcher opens=74 closes=73" mismatch (suite level).

        Show
        Dawid Weiss added a comment - https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13383/consoleText Just a note – a recent build with "SolrIndexSearcher opens=74 closes=73" mismatch (suite level).
        Hide
        Yonik Seeley added a comment -
           [junit4]    > (@AfterClass output)
           [junit4]   2> 35484 T1048 oash.SnapPuller.fetchLatestIndex SEVERE Master at: http://localhost:40685/solr/replication is not available. Index fetch failed. Exception: Connect to localhost:40685 timed out
           [junit4]   2> 153931 T846 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=74 closes=73
        

        It's often the case that an earlier failure can cause tests to abort in such ways that not everything is cleaned up and you'll also see a searcher mismatch error.

        Perhaps there's a way to tell if a test has already failed, then this open/closed searcher test can be skipped?

        Show
        Yonik Seeley added a comment - [junit4] > (@AfterClass output) [junit4] 2> 35484 T1048 oash.SnapPuller.fetchLatestIndex SEVERE Master at: http: //localhost:40685/solr/replication is not available. Index fetch failed. Exception: Connect to localhost:40685 timed out [junit4] 2> 153931 T846 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=74 closes=73 It's often the case that an earlier failure can cause tests to abort in such ways that not everything is cleaned up and you'll also see a searcher mismatch error. Perhaps there's a way to tell if a test has already failed, then this open/closed searcher test can be skipped?
        Hide
        Robert Muir added a comment -

        I think LuceneTestCase.testsFailed boolean variable should work?

        LuceneTestCase.afterClass uses this logic too, for the same reason.

        Show
        Robert Muir added a comment - I think LuceneTestCase.testsFailed boolean variable should work? LuceneTestCase.afterClass uses this logic too, for the same reason.
        Hide
        Dawid Weiss added a comment -

        Perhaps there's a way to tell if a test has already failed, then this open/closed searcher test can be skipped?

        What Robert pointed to is ok. The "nicest" way to cleanup-on-success only is to write a test rule and cleanup/ make assertions if the sub-statement passes without throwing an exception. @Before*/After* hooks are always executed, regardless of the test output (failure or no failure).

        I realize the number of nested stack entries from rules can bother some, but it is really a clean separation of concerns and the ordering is then explicit (both in the code and at runtime) which is nice.

        I'm not asking you to fix this – I will be biting into LuceneTestCase at some point (have some urgent stuff piled up right now) and I will try to clean up @Before/@After* hooks as well so that they're not a single blob of various checks and things.

        Show
        Dawid Weiss added a comment - Perhaps there's a way to tell if a test has already failed, then this open/closed searcher test can be skipped? What Robert pointed to is ok. The "nicest" way to cleanup-on-success only is to write a test rule and cleanup/ make assertions if the sub-statement passes without throwing an exception. @Before*/After* hooks are always executed, regardless of the test output (failure or no failure). I realize the number of nested stack entries from rules can bother some, but it is really a clean separation of concerns and the ordering is then explicit (both in the code and at runtime) which is nice. I'm not asking you to fix this – I will be biting into LuceneTestCase at some point (have some urgent stuff piled up right now) and I will try to clean up @Before/@After* hooks as well so that they're not a single blob of various checks and things.

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development