Lucene - Core
  1. Lucene - Core
  2. LUCENE-686

Resources not always reclaimed in scorers after each search

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      All

      Description

      Resources are not always reclaimed in scorers after each search.

      For example, close() is not always called for term docs in TermScorer.

      A test will be attached to show when resources are not reclaimed.

        Activity

        Hide
        Ning Li added a comment -

        A patch is attached:

        • The patch is based on the lastest version from trunk.
        • The patch includes a test called TestScorerResourceGJ which shows resources are not reclaimed after each search without the patch.
        • The patch passes TestScorerResourcesGJ.
        • The patch passes all the unit tests.
        Show
        Ning Li added a comment - A patch is attached: The patch is based on the lastest version from trunk. The patch includes a test called TestScorerResourceGJ which shows resources are not reclaimed after each search without the patch. The patch passes TestScorerResourcesGJ. The patch passes all the unit tests.
        Hide
        Paul Elschot added a comment -

        Three things:

        Is there an actual memory leak problem related to this?

        In ReqExclScorer the two scorers can also be closed when they are set to null.

        It's probably better to use try/finally in IndexSearcher and call close in in the finally clause,
        exceptions are occasionally used to preliminary end a search, although not in the
        lucene core afaik.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Three things: Is there an actual memory leak problem related to this? In ReqExclScorer the two scorers can also be closed when they are set to null. It's probably better to use try/finally in IndexSearcher and call close in in the finally clause, exceptions are occasionally used to preliminary end a search, although not in the lucene core afaik. Regards, Paul Elschot
        Hide
        Ning Li added a comment -

        > Is there an actual memory leak problem related to this?

        Right now no. For example, in FS based directories, the index inputs term docs use are clones.
        Close() of cloned index inputs does not close the file descriptor. Only the origianl one does.

        However, memory leak could happen to a new subclass of directory and index input, if cloned
        instances require reclaiming resources. In addition, memory leak could happen to a new subclass
        of scorer, if there are resources associated with the scorer which should be reclaimed when done.

        > In ReqExclScorer the two scorers can also be closed when they are set to null.

        Thanks for pointing this out. I'll double check all scorers and make sure close() are properly called.

        > It's probably better to use try/finally in IndexSearcher and call close in in the finally clause,
        > exceptions are occasionally used to preliminary end a search, although not in the
        > lucene core afaik.

        Will do. Thanks again!

        Cheers,
        Ning

        Show
        Ning Li added a comment - > Is there an actual memory leak problem related to this? Right now no. For example, in FS based directories, the index inputs term docs use are clones. Close() of cloned index inputs does not close the file descriptor. Only the origianl one does. However, memory leak could happen to a new subclass of directory and index input, if cloned instances require reclaiming resources. In addition, memory leak could happen to a new subclass of scorer, if there are resources associated with the scorer which should be reclaimed when done. > In ReqExclScorer the two scorers can also be closed when they are set to null. Thanks for pointing this out. I'll double check all scorers and make sure close() are properly called. > It's probably better to use try/finally in IndexSearcher and call close in in the finally clause, > exceptions are occasionally used to preliminary end a search, although not in the > lucene core afaik. Will do. Thanks again! Cheers, Ning
        Hide
        Paul Elschot added a comment -

        In SpanTermQuery in the spans package, there is also a TermPositions that might
        need to be closed.
        This means that closing would have to be done via Spans in the whole spans package.

        But as long as there is no real a memory leak, what is the point of
        adding this close functionality?

        Show
        Paul Elschot added a comment - In SpanTermQuery in the spans package, there is also a TermPositions that might need to be closed. This means that closing would have to be done via Spans in the whole spans package. But as long as there is no real a memory leak, what is the point of adding this close functionality?
        Hide
        Hoss Man added a comment -

        Quick summary of some discussion from the mailing list...

        1) i replied to paul's comments in the bug indicating that while there may not be any leaks in the core code base, these changes were needed to allow people writing custom Directories or custom Scorers to avoid memory leaks.
        2) paul suggested that people writing custom code can work arround this by subclassing/customizing the Directory, and all the Scorers, and the IndexSearcher
        3) i suggested that made the barrier for new custom code rather high, and made a poor comparison that got us on a tangent.
        4) i argued that since TermDocs had a close method, Scorers needed to call it, which ment they needed a close method which was garunteed to be called.
        5) paul argued that TermDocs.close in the core right now isn't needed, and we might be better off removing it, and requiring any more complicated custom implimentations to rely on GC to clean up any resources they have (presumably using a finalize method)
        6) steven_parkes then raised the point that the fundemental issue is design integrity ... we have to agree what the point of TermDocs.close is from an API standpoint, and that callers should not have to know what the concrete implimentation of hte callee is to know wether close needs to be called. Better documentation on the purpose of the method can lead to better discussion about wether it can be removed, or if the current behavior is a bug that needs fixed.

        Show
        Hoss Man added a comment - Quick summary of some discussion from the mailing list... 1) i replied to paul's comments in the bug indicating that while there may not be any leaks in the core code base, these changes were needed to allow people writing custom Directories or custom Scorers to avoid memory leaks. 2) paul suggested that people writing custom code can work arround this by subclassing/customizing the Directory, and all the Scorers, and the IndexSearcher 3) i suggested that made the barrier for new custom code rather high, and made a poor comparison that got us on a tangent. 4) i argued that since TermDocs had a close method, Scorers needed to call it, which ment they needed a close method which was garunteed to be called. 5) paul argued that TermDocs.close in the core right now isn't needed, and we might be better off removing it, and requiring any more complicated custom implimentations to rely on GC to clean up any resources they have (presumably using a finalize method) 6) steven_parkes then raised the point that the fundemental issue is design integrity ... we have to agree what the point of TermDocs.close is from an API standpoint, and that callers should not have to know what the concrete implimentation of hte callee is to know wether close needs to be called. Better documentation on the purpose of the method can lead to better discussion about wether it can be removed, or if the current behavior is a bug that needs fixed.
        Hide
        Doron Cohen added a comment -

        An example of how current Lucene code relies on not having to close resoures, in PhraseQuery:
        ...
        scorer(IndexReader reader) {
        ...
        for (int i = 0; i < terms.size(); i++)

        { TermPositions p = reader.termPositions((Term)terms.elementAt(i)); if (p == null) return null; < - - - - change would be required here tps[i] = p; }

        If close() has to be respected this code would need to change to close all TermPositions that were obtained just before the one that was not found.

        Show
        Doron Cohen added a comment - An example of how current Lucene code relies on not having to close resoures, in PhraseQuery: ... scorer(IndexReader reader) { ... for (int i = 0; i < terms.size(); i++) { TermPositions p = reader.termPositions((Term)terms.elementAt(i)); if (p == null) return null; < - - - - change would be required here tps[i] = p; } If close() has to be respected this code would need to change to close all TermPositions that were obtained just before the one that was not found.
        Hide
        Steven Parkes added a comment -

        Well, this has been a nice example to drive me into some of the internals.

        It seems like close() methods are around in most cases to provide determinism in operations that have externally visible side effects, e.g., commits in IndexReader.

        But it doesn't seem like TermDocs has this type of behavior or is likely to in the foreseeable future? SegmentReader always keeps open handles to the appropriate files anyway.

        One can certainly posit that someday a version of something will change and TermDocs.close() will do something, but wouldn't YAGNI argue for waiting until that's really the case?

        I still kinda feel like this half-in/half-out (called sometimes, but not always) is kinda the worst of all worlds: it can't be relied on (not always called) and all those calls that don't do anything are (probably insignificant) wasted time. But the biggest thing is the intellectual drag, where people spend time trying to understand something that at least appears inconsistent.

        If this reasoning is right, what do people think about just yanking it out?

        Show
        Steven Parkes added a comment - Well, this has been a nice example to drive me into some of the internals. It seems like close() methods are around in most cases to provide determinism in operations that have externally visible side effects, e.g., commits in IndexReader. But it doesn't seem like TermDocs has this type of behavior or is likely to in the foreseeable future? SegmentReader always keeps open handles to the appropriate files anyway. One can certainly posit that someday a version of something will change and TermDocs.close() will do something, but wouldn't YAGNI argue for waiting until that's really the case? I still kinda feel like this half-in/half-out (called sometimes, but not always) is kinda the worst of all worlds: it can't be relied on (not always called) and all those calls that don't do anything are (probably insignificant) wasted time. But the biggest thing is the intellectual drag, where people spend time trying to understand something that at least appears inconsistent. If this reasoning is right, what do people think about just yanking it out?
        Hide
        Ning Li added a comment -

        But removing TermDocs.close() will leave IndexInput.close() in a
        similar half-in/half-out situation: e.g. close() will not be called
        for freqStream and skipStream in SegmentTermDocs. Yet
        IndexInput.close() cannot be removed (e.g. FSIndexInput).

        Show
        Ning Li added a comment - But removing TermDocs.close() will leave IndexInput.close() in a similar half-in/half-out situation: e.g. close() will not be called for freqStream and skipStream in SegmentTermDocs. Yet IndexInput.close() cannot be removed (e.g. FSIndexInput).
        Hide
        Steven Parkes added a comment -

        You're right, and that's actually at the heart of the issue.

        Seems like part of what needs to be clarified is what clone() does on an IndexInput. It doesn't simply make another copy of the IndexInput. The result is neither independent of nor equivalent to the original. At least for FSIndexInput:

        • not equivalent: close on a clone is different then close on the original
        • not independent: close on the original impacts the behavior of existing clones

        I don't see that that's going to change. The semantics may be a little bit squrrely (and maybe the name clone() isn't the best, if the result isn't a true clone?), but it's obviously not causing any problems.

        I would think the best thing to do would be just document this existing behavior so it can be relied on: clones of IndexInputs do not need to be closed and cannot be relied upon if the clone source is closed.

        If/once that's the documented behavior, it's okay to get rid of all the extra close()es. New implementations of IndexInput just need to abide by that behavior, however they choose to implement it.

        Show
        Steven Parkes added a comment - You're right, and that's actually at the heart of the issue. Seems like part of what needs to be clarified is what clone() does on an IndexInput. It doesn't simply make another copy of the IndexInput. The result is neither independent of nor equivalent to the original. At least for FSIndexInput: not equivalent: close on a clone is different then close on the original not independent: close on the original impacts the behavior of existing clones I don't see that that's going to change. The semantics may be a little bit squrrely (and maybe the name clone() isn't the best, if the result isn't a true clone?), but it's obviously not causing any problems. I would think the best thing to do would be just document this existing behavior so it can be relied on: clones of IndexInputs do not need to be closed and cannot be relied upon if the clone source is closed. If/once that's the documented behavior, it's okay to get rid of all the extra close()es. New implementations of IndexInput just need to abide by that behavior, however they choose to implement it.
        Hide
        Steven Parkes added a comment -

        Came across these:

        http://www.gossamer-threads.com/lists/lucene/java-dev/34085#34085
        http://www.gossamer-threads.com/lists/lucene/java-dev/34121#34121

        Even if we don't change semantics, we could get some of this into the docs.

        Show
        Steven Parkes added a comment - Came across these: http://www.gossamer-threads.com/lists/lucene/java-dev/34085#34085 http://www.gossamer-threads.com/lists/lucene/java-dev/34121#34121 Even if we don't change semantics, we could get some of this into the docs.
        Hide
        Michael McCandless added a comment -

        I just committed some fixes in LUCENE-823 that could apply here. Specifically if we decide that clones should be called, then, MockRAMDirectory is setup to track that they in fact are closed (we would have to turn it on; right now it does not track clones).

        Show
        Michael McCandless added a comment - I just committed some fixes in LUCENE-823 that could apply here. Specifically if we decide that clones should be called, then, MockRAMDirectory is setup to track that they in fact are closed (we would have to turn it on; right now it does not track clones).
        Hide
        Shai Erera added a comment -

        I don't think it's a problem anymore - today our tests verify that all open files are closed by tests, and that applies to Scorers too. Anyway, the issue is inactive since 2007, so closing.

        Show
        Shai Erera added a comment - I don't think it's a problem anymore - today our tests verify that all open files are closed by tests, and that applies to Scorers too. Anyway, the issue is inactive since 2007, so closing.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ning Li
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development