Lucene - Core
  1. Lucene - Core
  2. LUCENE-1715

DirectoryIndexReader finalize() holding TermInfosReader longer than necessary

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.1
    • Fix Version/s: 2.9
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Sun JDK 6 update 12 64-bit, Debian Lenny

    • Lucene Fields:
      New

      Description

      DirectoryIndexReader has a finalize method, which causes the JDK to keep a reference to the object until it can be finalized. SegmentReader and MultiSegmentReader are subclasses that contain references to, potentially, hundreds of megabytes of cached data in a TermInfosReader.

      Some options would be removing finalize() from DirectoryIndexReader (it releases a write lock at the moment) or possibly nulling out references in various close() and doClose() methods throughout the class hierarchy so that the finalizable object doesn't references the Term arrays.

      Original mailing list message:
      http://mail-archives.apache.org/mod_mbox/lucene-java-user/200906.mbox/%3C7A5CB4A7BBCE0C40B81C5145C326C31301A62971@NUMEVP06.na.imtn.com%3E

      1. LUCENE-1715.patch
        4 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        I'd be inclined to remove the finalizer at this point. Apps should not be relying on GC to release the write lock, and if they are, at best they get very buggy behavior (eg we don't flush changes to disk, other writers will sometimes to acquire the write lock because GC didn't finalize, etc.),

        If we remove the finalizer then the closed SegmentReader should be GC'd promptly.

        Show
        Michael McCandless added a comment - I'd be inclined to remove the finalizer at this point. Apps should not be relying on GC to release the write lock, and if they are, at best they get very buggy behavior (eg we don't flush changes to disk, other writers will sometimes to acquire the write lock because GC didn't finalize, etc.), If we remove the finalizer then the closed SegmentReader should be GC'd promptly.
        Hide
        Earwin Burrfoot added a comment -

        +1
        I suggest removing finalize() from IndexWriter too.

        If someone forgets to close IR/IW - that's his personal problem. (Which, in the abscence of finalizer he'll going to notice and fix pretty soon)

        Show
        Earwin Burrfoot added a comment - +1 I suggest removing finalize() from IndexWriter too. If someone forgets to close IR/IW - that's his personal problem. (Which, in the abscence of finalizer he'll going to notice and fix pretty soon)
        Hide
        Michael McCandless added a comment -

        I suggest removing finalize() from IndexWriter too.

        +1

        At best you get buggy behavior if somehow you're relying on these finalizers...

        I plan to commit shortly.

        Show
        Michael McCandless added a comment - I suggest removing finalize() from IndexWriter too. +1 At best you get buggy behavior if somehow you're relying on these finalizers... I plan to commit shortly.
        Hide
        Michael McCandless added a comment -

        Thanks Brian!

        Show
        Michael McCandless added a comment - Thanks Brian!
        Hide
        Uwe Schindler added a comment -

        I would still additionally null the references Brian notes for even faster release by GC?

        Show
        Uwe Schindler added a comment - I would still additionally null the references Brian notes for even faster release by GC?
        Hide
        Michael McCandless added a comment -

        Does that really help GC? (I thought not)

        Had we left finalize in place, I agree nulling would help, because stuff can sit in the finalize queue for a while and nulling would have severed it. But now since all of DirectoryReader will be GCd "at once", I didn't think nulling would help?

        Though, I suppose what it would help is when someone closes the reader but then still hangs onto it, so perhaps we should do it?

        Show
        Michael McCandless added a comment - Does that really help GC? (I thought not) Had we left finalize in place, I agree nulling would help, because stuff can sit in the finalize queue for a while and nulling would have severed it. But now since all of DirectoryReader will be GCd "at once", I didn't think nulling would help? Though, I suppose what it would help is when someone closes the reader but then still hangs onto it, so perhaps we should do it?
        Hide
        Uwe Schindler added a comment -

        That was my intention.

        Show
        Uwe Schindler added a comment - That was my intention.
        Hide
        Simon Willnauer added a comment -

        Does that really help GC? (I thought not)

        It really depends on the VM implementation. In a dalvik VM people start nulling stuff all the time because it helps GC. In that case where a lot of memory can be collected I guess its worth to null the references. I generally do not null references but in this case I would really to it.

        so +1 from my side.

        Show
        Simon Willnauer added a comment - Does that really help GC? (I thought not) It really depends on the VM implementation. In a dalvik VM people start nulling stuff all the time because it helps GC. In that case where a lot of memory can be collected I guess its worth to null the references. I generally do not null references but in this case I would really to it. so +1 from my side.
        Hide
        Michael McCandless added a comment -

        Reopening for nulling at least termInfosReader; we should probably null other memory intensive things like deleted docs & norms.

        Show
        Michael McCandless added a comment - Reopening for nulling at least termInfosReader; we should probably null other memory intensive things like deleted docs & norms.
        Hide
        Robert Newson added a comment - - edited

        I wonder if it's also worth examining the (very few) other places that have finalize()?

        The mere presence of a finalize() method triggers different handling by the garbage collector. Since all remaining finalize() methods appear to close resources that should have been closed explicitly, the same principle applies for those as for the resolution of LUCENE-1715?

        Show
        Robert Newson added a comment - - edited I wonder if it's also worth examining the (very few) other places that have finalize()? The mere presence of a finalize() method triggers different handling by the garbage collector. Since all remaining finalize() methods appear to close resources that should have been closed explicitly, the same principle applies for those as for the resolution of LUCENE-1715 ?
        Hide
        Michael McCandless added a comment -

        SimpleFSDirectory.FSIndexInput's finalize() protects you from running out of descriptors, if you fail to close your reader; NativeFSLockFactory protects you if you forget to release the lock (close your writer).

        I agree, I think we should remove both of these.

        Show
        Michael McCandless added a comment - SimpleFSDirectory.FSIndexInput's finalize() protects you from running out of descriptors, if you fail to close your reader; NativeFSLockFactory protects you if you forget to release the lock (close your writer). I agree, I think we should remove both of these.
        Hide
        Earwin Burrfoot added a comment -

        I object nulling references in attempt to speed up GC. It's totally useless on any decent JVM implementation and if someone uses indecent JVM, I doubt he's concerned with his app efficiency.

        Show
        Earwin Burrfoot added a comment - I object nulling references in attempt to speed up GC. It's totally useless on any decent JVM implementation and if someone uses indecent JVM, I doubt he's concerned with his app efficiency.
        Hide
        Earwin Burrfoot added a comment -

        And support removing finalizers everywhere if their only point is to guard against forgotten close().

        Show
        Earwin Burrfoot added a comment - And support removing finalizers everywhere if their only point is to guard against forgotten close().
        Hide
        Michael McCandless added a comment -

        I agree nulling is not a good practice to make GC faster.

        But... for freeing up memory even if the app still holds a reference to the reader after closing it, I think this is in fact worthwhile.

        Show
        Michael McCandless added a comment - I agree nulling is not a good practice to make GC faster. But... for freeing up memory even if the app still holds a reference to the reader after closing it, I think this is in fact worthwhile.
        Hide
        Simon Willnauer added a comment -

        I is def. not good practice and I agree that a decent VM should not care. In some environments you don't have a choice (mobile phones for instance) and if selected pieces of "nulling" code can speed things up we should do it. I will run a benchmark on a dalivk VM (Android) to show the difference with the change. I might not have time today or tomorrow though.
        This change is not visible to anybody using lucene so to me its not that much of a deal.
        To be honest I'm not a fan of doing that at all but in this case it "could" be useful in some corner cases but does not harm anybody.

        Show
        Simon Willnauer added a comment - I is def. not good practice and I agree that a decent VM should not care. In some environments you don't have a choice (mobile phones for instance) and if selected pieces of "nulling" code can speed things up we should do it. I will run a benchmark on a dalivk VM (Android) to show the difference with the change. I might not have time today or tomorrow though. This change is not visible to anybody using lucene so to me its not that much of a deal. To be honest I'm not a fan of doing that at all but in this case it "could" be useful in some corner cases but does not harm anybody.
        Hide
        Earwin Burrfoot added a comment -

        There's in fact one case where nulling harms. I'm going to try making as much of IR as possible immutable and final. Load everything upfront on creation/reopen (or don't load if IR is created for, say, merging). Unlike nulling references, making frequently accessed fields final does have an impact under adequate JVMs.

        Well, nulling can be added now and removed when/if I finish my IR stuff.

        Show
        Earwin Burrfoot added a comment - There's in fact one case where nulling harms. I'm going to try making as much of IR as possible immutable and final. Load everything upfront on creation/reopen (or don't load if IR is created for, say, merging). Unlike nulling references, making frequently accessed fields final does have an impact under adequate JVMs. Well, nulling can be added now and removed when/if I finish my IR stuff.
        Hide
        Michael McCandless added a comment -

        OK, attached patch nulling just a few things... I plan to commit in a day or two.

        Show
        Michael McCandless added a comment - OK, attached patch nulling just a few things... I plan to commit in a day or two.
        Hide
        Michael McCandless added a comment -

        Thanks Brian!

        Show
        Michael McCandless added a comment - Thanks Brian!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Brian Groose
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development