Lucene - Core
  1. Lucene - Core
  2. LUCENE-2387

IndexWriter retains references to Readers used in Fields (memory leak)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.1
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As described in [1] IndexWriter retains references to Reader used in Fields and that can lead to big memory leaks when using tika's ParsingReaders (as those can take 1MB per ParsingReader).

      [2] shows a screenshot of the reference chain to the Reader from the IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the following:

      IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> DocFieldProcessorPerThread -> DocFieldProcessorPerField -> Fieldable -> Field (fieldsData)

      -------------
      [1] http://markmail.org/thread/ndmcgffg2mnwjo47
      [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer

      1. LUCENE-2387.patch
        0.6 kB
        Michael McCandless
      2. ASF.LICENSE.NOT.GRANTED--LUCENE-2387-29x.patch
        1 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patch nulls out the Fieldable reference.

        Show
        Michael McCandless added a comment - Attached patch nulls out the Fieldable reference.
        Hide
        Uwe Schindler added a comment -

        As Tokenizers are reused, the analyzer holds also a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this:

        /** By default, closes the input Reader. */
        @Override
        public void close() throws IOException {
            input.close();
            input = null; // <-- new!
        }
        
        Show
        Uwe Schindler added a comment - As Tokenizers are reused, the analyzer holds also a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this: /** By default , closes the input Reader. */ @Override public void close() throws IOException { input.close(); input = null ; // <-- new ! }
        Hide
        Michael McCandless added a comment -

        I agree, Uwe – I'll fold that into the patch. Thanks.

        Show
        Michael McCandless added a comment - I agree, Uwe – I'll fold that into the patch. Thanks.
        Hide
        Michael McCandless added a comment -

        29x version of this patch.

        Show
        Michael McCandless added a comment - 29x version of this patch.
        Hide
        Shay Banon added a comment -

        Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version.

        Show
        Shay Banon added a comment - Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version.
        Hide
        Michael McCandless added a comment -

        OK I'll backport.

        Show
        Michael McCandless added a comment - OK I'll backport.
        Hide
        Shay Banon added a comment -

        Thanks!

        Show
        Shay Banon added a comment - Thanks!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Ruben Laguna
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development