Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2
    • Fix Version/s: 2.4
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Java's ThreadLocal is dangerous to use because it is able to take a
      surprisingly very long time to release references to the values you
      store in it. Even when a ThreadLocal instance itself is GC'd, hard
      references to the values you had stored in it are easily kept for
      quite some time later.

      While this is not technically a "memory leak", because eventually
      (when the underlying Map that stores the values cleans up its "stale"
      references) the hard reference will be cleared, and GC can proceed,
      its end behavior is not different from a memory leak in that under the
      right situation you can easily tie up far more memory than you'd
      expect, and then hit unexpected OOM error despite allocating an
      extremely large heap to your JVM.

      Lucene users have hit this many times. Here's the most recent thread:

      http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200809.mbox/%3C6e3ae6310809091157j7a9fe46bxcc31f6e63305fcdc%40mail.gmail.com%3E

      And here's another:

      http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/%3CF5FC94B2-E5C7-40C0-8B73-E12245B91CEE%40mikemccandless.com%3E

      And then there's LUCENE-436 and LUCENE-529 at least.

      A google search for "ThreadLocal leak" yields many compelling hits.

      Sun does this for performance reasons, but I think it's a terrible
      trap and we should work around it with Lucene.

      1. ScreenHunter_07 Sep. 13 19.13.jpg
        17 kB
        Chris Lu
      2. ScreenHunter_03 Sep. 13 08.43.jpg
        16 kB
        Chris Lu
      3. ScreenHunter_02 Sep. 13 08.42.jpg
        16 kB
        Chris Lu
      4. ScreenHunter_01 Sep. 13 08.40.jpg
        15 kB
        Chris Lu
      5. LUCENE-1383.patch
        7 kB
        Michael McCandless

        Activity

        Hide
        adrian.tarau Adrian Tarau added a comment -

        If closing all index readers and writers releases all Lucene thread locals it's great.

        Thanks.

        Show
        adrian.tarau Adrian Tarau added a comment - If closing all index readers and writers releases all Lucene thread locals it's great. Thanks.
        Hide
        rengels@ix.netcom.com robert engels added a comment -

        You cannot control this 'externally", since there is no way to force the cleaning of the stale entries (no API method).

        The current patch sort of allows this - the entries are not cleared but the weak references in them are - via the close(), but if this method is called at the wrong time, everything will break (since the stream needs to be cached), thus it is not exposed to the outside world.

        If you want to cleanup programatically, close the index readers and writers.

        Show
        rengels@ix.netcom.com robert engels added a comment - You cannot control this 'externally", since there is no way to force the cleaning of the stale entries (no API method). The current patch sort of allows this - the entries are not cleared but the weak references in them are - via the close(), but if this method is called at the wrong time, everything will break (since the stream needs to be cached), thus it is not exposed to the outside world. If you want to cleanup programatically, close the index readers and writers.
        Hide
        adrian.tarau Adrian Tarau added a comment -

        Michael,

        Maybe some sort of cleanup method should be created even if regular users will not call it.

        I presume it can be highlighted somewhere in the documentation, like in "Best practices" : http://wiki.apache.org/lucene-java/BestPractices

        I would really like to be able to control this, to be able to clean thread(s) storage programmatically.

        Show
        adrian.tarau Adrian Tarau added a comment - Michael, Maybe some sort of cleanup method should be created even if regular users will not call it. I presume it can be highlighted somewhere in the documentation, like in "Best practices" : http://wiki.apache.org/lucene-java/BestPractices I would really like to be able to control this, to be able to clean thread(s) storage programmatically.
        Hide
        adrian.tarau Adrian Tarau added a comment -

        Ok, maybe "fixed" was too much and it sounded like ThreadLocal has a problem. ThreadLocal works fine except that it is tricky with long-lived threads.

        Anyway, it is a good article in case somebody wants to understand why this change in Lucene.

        Show
        adrian.tarau Adrian Tarau added a comment - Ok, maybe "fixed" was too much and it sounded like ThreadLocal has a problem. ThreadLocal works fine except that it is tricky with long-lived threads. Anyway, it is a good article in case somebody wants to understand why this change in Lucene.
        Hide
        mikemccand Michael McCandless added a comment -

        The current patch solves the problem suitably for Lucene, at the expense of some performance degradation.

        The performance degradation should be negligible in the case of long-lived threads. No synchronized code was added in the get() path.

        It is also not valid, since you need to clear all values across all threads, remove() only clears the entry for the calling thread.

        This best practice does work correctly: you're supposed to call remove() from the very thread that had inserted something into the ThreadLocal.

        Besides the 1.5 issue, this is also difficult for Lucene because we don't have a clean point to "know" when the thread has finished interacting with Lucene. A thread comes out of the pool, runs a search, gathers docIDs from in a collector, returns from the search, one by one looks up stored docs / term vectors for these docIDs, maybe does secondary search up front to build up filters, etc., finishes rendering the result and returns to the pool. Unless we create a new method "detachThread(...)" somewhere in Lucene, there is no natural point now to do the remove(). And I don't like creating such a method because nobody will ever know they need to call it in their App server until they start hitting cryptic OOMs.

        Show
        mikemccand Michael McCandless added a comment - The current patch solves the problem suitably for Lucene, at the expense of some performance degradation. The performance degradation should be negligible in the case of long-lived threads. No synchronized code was added in the get() path. It is also not valid, since you need to clear all values across all threads, remove() only clears the entry for the calling thread. This best practice does work correctly: you're supposed to call remove() from the very thread that had inserted something into the ThreadLocal. Besides the 1.5 issue, this is also difficult for Lucene because we don't have a clean point to "know" when the thread has finished interacting with Lucene. A thread comes out of the pool, runs a search, gathers docIDs from in a collector, returns from the search, one by one looks up stored docs / term vectors for these docIDs, maybe does secondary search up front to build up filters, etc., finishes rendering the result and returns to the pool. Unless we create a new method "detachThread(...)" somewhere in Lucene, there is no natural point now to do the remove(). And I don't like creating such a method because nobody will ever know they need to call it in their App server until they start hitting cryptic OOMs.
        Hide
        rengels@ix.netcom.com robert engels added a comment -

        It doesn't need to be "fixed".

        It works fine, you need to understand how to use it properly which may require calling remove() if your threads are long-lived.

        Also, remove() was only added in 1.5, and that is why that technique is not valid for Lucene. It is also not valid, since you need to clear all values across all threads, remove() only clears the entry for the calling thread.

        The current patch solves the problem suitably for Lucene, at the expense of some performance degradation.

        Show
        rengels@ix.netcom.com robert engels added a comment - It doesn't need to be "fixed". It works fine, you need to understand how to use it properly which may require calling remove() if your threads are long-lived. Also, remove() was only added in 1.5, and that is why that technique is not valid for Lucene. It is also not valid, since you need to clear all values across all threads, remove() only clears the entry for the calling thread. The current patch solves the problem suitably for Lucene, at the expense of some performance degradation.
        Hide
        adrian.tarau Adrian Tarau added a comment -

        Even if the issue is closed, for those who want to know why ThreadLocal had to be fixed : http://www.javaspecialists.eu/archive/Issue164.html

        "Best Practices

        If you must use ThreadLocal, make sure that you remove the value as soon as you are done with it and preferably before you return your thread to a thread pool. Best practice is to use remove() rather than set(null), since that would cause the WeakReference to be removed immediately, together with the value.

        Kind regards

        Heinz"

        Show
        adrian.tarau Adrian Tarau added a comment - Even if the issue is closed, for those who want to know why ThreadLocal had to be fixed : http://www.javaspecialists.eu/archive/Issue164.html "Best Practices If you must use ThreadLocal, make sure that you remove the value as soon as you are done with it and preferably before you return your thread to a thread pool. Best practice is to use remove() rather than set(null), since that would cause the WeakReference to be removed immediately, together with the value. Kind regards Heinz"
        Hide
        mikemccand Michael McCandless added a comment -

        Committed revision 695184.

        Show
        mikemccand Michael McCandless added a comment - Committed revision 695184.
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Chris!

        Show
        mikemccand Michael McCandless added a comment - Thanks Chris!
        Hide
        mikemccand Michael McCandless added a comment -

        OK super! Thanks for testing Chris. I'll commit shortly.

        Show
        mikemccand Michael McCandless added a comment - OK super! Thanks for testing Chris. I'll commit shortly.
        Hide
        chrislusf Chris Lu added a comment -

        this is the V shape memory graph using the patch. This also confirms the leak is fixed.

        Show
        chrislusf Chris Lu added a comment - this is the V shape memory graph using the patch. This also confirms the leak is fixed.
        Hide
        chrislusf Chris Lu added a comment - - edited

        On second thought, I think this is normal behavior. Like you said, the GC may take a while before really cleaning up stuff. So if I did not wait a while between the close and open, the little bump should be normal.

        So I would say this bug is fixed. Thanks!

        Show
        chrislusf Chris Lu added a comment - - edited On second thought, I think this is normal behavior. Like you said, the GC may take a while before really cleaning up stuff. So if I did not wait a while between the close and open, the little bump should be normal. So I would say this bug is fixed. Thanks!
        Hide
        mikemccand Michael McCandless added a comment -

        Hmm, the sometimes long-lasting bump in Used memory is odd.

        When you close the old IndexSearcher & RAMDirectory do you forcefully dereference them (set all variables that point to them to null) before then reopening the new ones? (Seems like you must, since you saw the V shape before LUCENE-1195, but I just want to verify). If you simply re-assign the variable from old to new then two objects will exist at once until the opening of the new one finishes.

        I think the bump may just be GC taking its time collecting the objects, which should be harmless. Maybe using a WeakReference to a long-lived object causes GC to take longer to collect?

        Is it possible to use the profiler to look for a reference to the old RAMDirectory while the bump is "up"? That would be a smoking gun that we do still have a reference, unless it's only via the WeakReference.

        Or, could you try lowering the heap size in your JRE to something slightly less than 2X one RAMDirectory (but not so low that the "other stuff" you have in the JRE can't fit). This would force GC to work harder / more immediately in reclaiming the unreachable objects.

        Show
        mikemccand Michael McCandless added a comment - Hmm, the sometimes long-lasting bump in Used memory is odd. When you close the old IndexSearcher & RAMDirectory do you forcefully dereference them (set all variables that point to them to null) before then reopening the new ones? (Seems like you must, since you saw the V shape before LUCENE-1195 , but I just want to verify). If you simply re-assign the variable from old to new then two objects will exist at once until the opening of the new one finishes. I think the bump may just be GC taking its time collecting the objects, which should be harmless. Maybe using a WeakReference to a long-lived object causes GC to take longer to collect? Is it possible to use the profiler to look for a reference to the old RAMDirectory while the bump is "up"? That would be a smoking gun that we do still have a reference, unless it's only via the WeakReference. Or, could you try lowering the heap size in your JRE to something slightly less than 2X one RAMDirectory (but not so low that the "other stuff" you have in the JRE can't fit). This would force GC to work harder / more immediately in reclaiming the unreachable objects.
        Hide
        chrislusf Chris Lu added a comment -

        The test is repeatedly close and open the index.

        After the repeat close/open test run finished, I closed the RAMDirectory based searcher/reader, and switched to FSDirectory, the RAMDirectory(which is fairly large) was left in the memory. I did several rounds of GC, but the RAMDirectory was not cleaned. At this time, I tried to do a memory dump, but an OOM occured.

        Show
        chrislusf Chris Lu added a comment - The test is repeatedly close and open the index. After the repeat close/open test run finished, I closed the RAMDirectory based searcher/reader, and switched to FSDirectory, the RAMDirectory(which is fairly large) was left in the memory. I did several rounds of GC, but the RAMDirectory was not cleaned. At this time, I tried to do a memory dump, but an OOM occured.
        Hide
        chrislusf Chris Lu added a comment -

        Endding of a memory increase of 2 RAMDirectory in the memory. This is also during a index closing, then opening. No particular reason, but the RAMDirectory dropped back to 1.

        Show
        chrislusf Chris Lu added a comment - Endding of a memory increase of 2 RAMDirectory in the memory. This is also during a index closing, then opening. No particular reason, but the RAMDirectory dropped back to 1.
        Hide
        chrislusf Chris Lu added a comment -

        Beginning of a memory increase of 2 RAMDirectory in the memory. This is also during a index closing, then opening.

        Show
        chrislusf Chris Lu added a comment - Beginning of a memory increase of 2 RAMDirectory in the memory. This is also during a index closing, then opening.
        Hide
        chrislusf Chris Lu added a comment -

        A little bump when closing and opening the index. The test did the closing first, then opening.

        Show
        chrislusf Chris Lu added a comment - A little bump when closing and opening the index. The test did the closing first, then opening.
        Hide
        chrislusf Chris Lu added a comment -

        I can confirm this patch fixed the memory problem, in general. Thanks!

        However, if not nitpicking, I did noticed in this patch, during the transition time between the previous index searcher/reader close and the new index searcher/reader open, there is a bump in the memory graph.
        ___/___
        There are 2 RAMDirectory instances in the memory for a short period of time.

        In previous Lucene version without Lucene-1195, the memory graph looks more clean, with a V dip.
        ____ ____
        \/

        And in one index refresh during the same index run, there were 2 RAMDirectory in the memory for a fairly long time. But not repeatable.

        So I suspect the closing is kind of delayed. And sometimes it could be missed.

        Attaching the screenshot for the memory bump, and 2 the memory graphs of the begin and after of 2RAMDirecory in the memory(needed 2 because the time is fairly long)

        Show
        chrislusf Chris Lu added a comment - I can confirm this patch fixed the memory problem, in general. Thanks! However, if not nitpicking, I did noticed in this patch, during the transition time between the previous index searcher/reader close and the new index searcher/reader open, there is a bump in the memory graph. ___ /_ __ There are 2 RAMDirectory instances in the memory for a short period of time. In previous Lucene version without Lucene-1195, the memory graph looks more clean, with a V dip. ____ ____ \/ And in one index refresh during the same index run, there were 2 RAMDirectory in the memory for a fairly long time. But not repeatable. So I suspect the closing is kind of delayed. And sometimes it could be missed. Attaching the screenshot for the memory bump, and 2 the memory graphs of the begin and after of 2RAMDirecory in the memory(needed 2 because the time is fairly long)
        Hide
        mikemccand Michael McCandless added a comment -

        Attached patch. All tests pass.

        The patch adds o.a.l.util.CloseableThreadLocal. It's a wrapper around ThreadLocal that wraps the values inside a WeakReference, but then also holds a strong reference to the value (to ensure GC doesn't reclaim it) until you call the close method. On calling close, GC is then free to reclaim all values you had stored, regardless of how long it takes ThreadLocal's implementation to actually release its references.

        There are a couple places in Lucene where I left the current usage of ThreadLocal.

        First, Analyzer.java uses ThreadLocal to hold reusable token streams. There is no "close" called for Analyzer, so unless we are willing to add a finalizer to call CloseableThreadLocal.close() I think we can leave it.

        Second, some of the contrib/benchmark tasks use ThreadLocal to store per-thread DateFormat which should use tiny memory.

        Show
        mikemccand Michael McCandless added a comment - Attached patch. All tests pass. The patch adds o.a.l.util.CloseableThreadLocal. It's a wrapper around ThreadLocal that wraps the values inside a WeakReference, but then also holds a strong reference to the value (to ensure GC doesn't reclaim it) until you call the close method. On calling close, GC is then free to reclaim all values you had stored, regardless of how long it takes ThreadLocal's implementation to actually release its references. There are a couple places in Lucene where I left the current usage of ThreadLocal. First, Analyzer.java uses ThreadLocal to hold reusable token streams. There is no "close" called for Analyzer, so unless we are willing to add a finalizer to call CloseableThreadLocal.close() I think we can leave it. Second, some of the contrib/benchmark tasks use ThreadLocal to store per-thread DateFormat which should use tiny memory.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development