Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2387

IndexWriter retains references to Readers used in Fields (memory leak)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.1
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As described in [1] IndexWriter retains references to Reader used in Fields and that can lead to big memory leaks when using tika's ParsingReaders (as those can take 1MB per ParsingReader).

      [2] shows a screenshot of the reference chain to the Reader from the IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the following:

      IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> DocFieldProcessorPerThread -> DocFieldProcessorPerField -> Fieldable -> Field (fieldsData)

      -------------
      [1] http://markmail.org/thread/ndmcgffg2mnwjo47
      [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer

      1. LUCENE-2387.patch
        0.6 kB
        Michael McCandless
      2. ASF.LICENSE.NOT.GRANTED--LUCENE-2387-29x.patch
        1 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Attached patch nulls out the Fieldable reference.

        Show
        mikemccand Michael McCandless added a comment - Attached patch nulls out the Fieldable reference.
        Hide
        thetaphi Uwe Schindler added a comment -

        As Tokenizers are reused, the analyzer holds also a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this:

        /** By default, closes the input Reader. */
        @Override
        public void close() throws IOException {
            input.close();
            input = null; // <-- new!
        }
        
        Show
        thetaphi Uwe Schindler added a comment - As Tokenizers are reused, the analyzer holds also a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this: /** By default , closes the input Reader. */ @Override public void close() throws IOException { input.close(); input = null ; // <-- new ! }
        Hide
        mikemccand Michael McCandless added a comment -

        I agree, Uwe – I'll fold that into the patch. Thanks.

        Show
        mikemccand Michael McCandless added a comment - I agree, Uwe – I'll fold that into the patch. Thanks.
        Hide
        mikemccand Michael McCandless added a comment -

        29x version of this patch.

        Show
        mikemccand Michael McCandless added a comment - 29x version of this patch.
        Hide
        kimchy Shay Banon added a comment -

        Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version.

        Show
        kimchy Shay Banon added a comment - Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version.
        Hide
        mikemccand Michael McCandless added a comment -

        OK I'll backport.

        Show
        mikemccand Michael McCandless added a comment - OK I'll backport.
        Hide
        kimchy Shay Banon added a comment -

        Thanks!

        Show
        kimchy Shay Banon added a comment - Thanks!

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            ecerulm Ruben Laguna
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development