Lucene - Core
  1. Lucene - Core
  2. LUCENE-2467

IndexWriter memory leak when large docs are indexed

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2, 3.1, 4.0-ALPHA
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from the java-user thread "IndexWriter and memory usage"...

      IndexWriter has had a long standing memory leak, since LUCENE-843.

      When the byte/char/int blocks are recycled to the common pool, the
      per-thread DW classes incorrectly still hold a reference to them.

      This normally is not a problem, since these buffers will be re-used
      again.

      But, if you index a massive document, causing IW to allocate more than
      the RAM buffer allocated to it, then the leak happens. So you could
      have a 16 MB RAM buffer set, but if a huge doc required allocation of
      200 MB worth of arrays, those 200 MB are never freed (well, until you
      close the IW and deref it from the app).

      It's even worse if you use multiple threads: if each thread has ever
      had to index a massive document, then that thread incorrectly holds
      onto the extra arrays.

      1. LUCENE-2467.patch
        3 kB
        Michael McCandless
      2. LUCENE-2467.patch
        3 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached simple patch.

        The patch also fixes a couple other places where we hold onto memory for too long.

        Show
        Michael McCandless added a comment - Attached simple patch. The patch also fixes a couple other places where we hold onto memory for too long.
        Hide
        Michael McCandless added a comment -

        A couple more places to fix...

        Show
        Michael McCandless added a comment - A couple more places to fix...
        Hide
        Michael McCandless added a comment -

        Don't hold onto the last doc/analyzer that a given thread state held onto; don't reuse postings instances anymore (we don't on trunk anymore either).

        Show
        Michael McCandless added a comment - Don't hold onto the last doc/analyzer that a given thread state held onto; don't reuse postings instances anymore (we don't on trunk anymore either).

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development