Lucene - Core
  1. Lucene - Core
  2. LUCENE-4451

Memory leak per unique thread caused by RandomizedContext.contexts static map

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In digging on the hard-to-understand OOMEs with
      TestDirectPostingsFormat ... I found (thank you YourKit) that
      RandomizedContext (in randomizedtesting JAR) seems to be holding onto
      all threads created by the test. The test does create many very short
      lived threads (testing the thread safety of the postings format), in
      BasePostingsFormatTestCase.testTerms), and somehow these seem to tie
      up a lot (~100 MB) of RAM in RandomizedContext.contexts static map.

      For now I've disabled all thread testing (committed false && inside
      BPFTC.testTerms), but hopefully we can fix the root cause here, eg
      when a thread exits can we clear it from that map?

        Activity

        Hide
        Dawid Weiss added a comment -

        This is a border case. The problem is most likely not the threads themselves but also the fact that they keep refs to some other data structures (in thread locals)? I'll see what I can do not to keep hard refs to those thread/context pairs though. I remember there was a reason I didn't use soft refs for this from the beginning but I can't tell you right now what it was.

        Show
        Dawid Weiss added a comment - This is a border case. The problem is most likely not the threads themselves but also the fact that they keep refs to some other data structures (in thread locals)? I'll see what I can do not to keep hard refs to those thread/context pairs though. I remember there was a reason I didn't use soft refs for this from the beginning but I can't tell you right now what it was.
        Hide
        Michael McCandless added a comment -

        OK that's a good clue: the Threads are created as anon classes, which means they are holding references to lots of extra stuff. Let me try making them static classes, and explicitly null out the "stuff" when they are done running...

        Show
        Michael McCandless added a comment - OK that's a good clue: the Threads are created as anon classes, which means they are holding references to lots of extra stuff. Let me try making them static classes, and explicitly null out the "stuff" when they are done running...
        Hide
        Dawid Weiss added a comment -

        I looked at the code and I don't have an easy fix for now yet. The problem is that there are circular reference needs between Threads, Randoms and the runner so that we can assert that Random instances issued for a thread are not reused on other threads (or outside of a given test's lifespan). This leads to a cyclic dependency between Thread->PerThreadContext->AssertingRandom->Thread so even if you put a weak hash map for Thread->PerThreadContext it'll still keep a hard reference to the thread it's bound to.

        I could make AssertingRandom store a weak/soft reference to the thread it's bound to but I'm kind of afraid it'll affect the performance.

        Could we temporarily make a pool of threads for this test and reuse these? I'll be thinking of a workaround but it's going to take me some time.

        Show
        Dawid Weiss added a comment - I looked at the code and I don't have an easy fix for now yet. The problem is that there are circular reference needs between Threads, Randoms and the runner so that we can assert that Random instances issued for a thread are not reused on other threads (or outside of a given test's lifespan). This leads to a cyclic dependency between Thread->PerThreadContext->AssertingRandom->Thread so even if you put a weak hash map for Thread->PerThreadContext it'll still keep a hard reference to the thread it's bound to. I could make AssertingRandom store a weak/soft reference to the thread it's bound to but I'm kind of afraid it'll affect the performance. Could we temporarily make a pool of threads for this test and reuse these? I'll be thinking of a workaround but it's going to take me some time.
        Hide
        Dawid Weiss added a comment -

        I pushed a tentative fix for this (includes a test case).
        https://github.com/carrotsearch/randomizedtesting/issues/127

        I'd still like to hold for some time to make sure it's the best way to solve it.

        Show
        Dawid Weiss added a comment - I pushed a tentative fix for this (includes a test case). https://github.com/carrotsearch/randomizedtesting/issues/127 I'd still like to hold for some time to make sure it's the best way to solve it.
        Hide
        Michael McCandless added a comment -

        Could we temporarily make a pool of threads for this test and reuse these?

        I already committed a fix (to use static Thread subclass, and to null out the "heavy stuff" after the thread is done), and it seemed to work around the issue in my testing ... however, I don't re-use (pool) the threads.

        Show
        Michael McCandless added a comment - Could we temporarily make a pool of threads for this test and reuse these? I already committed a fix (to use static Thread subclass, and to null out the "heavy stuff" after the thread is done), and it seemed to work around the issue in my testing ... however, I don't re-use (pool) the threads.
        Hide
        Dawid Weiss added a comment -

        It'll still collect references to all these threads (and whatever they may be holding onto) so eventually it'll OOM if you create a really large number of them. I'll push the fix above in the next release; holding to Thread instances seems to be doing more evil than good.

        Show
        Dawid Weiss added a comment - It'll still collect references to all these threads (and whatever they may be holding onto) so eventually it'll OOM if you create a really large number of them. I'll push the fix above in the next release; holding to Thread instances seems to be doing more evil than good.
        Hide
        Dawid Weiss added a comment -

        Patch against the trunk updating rr to 2.0.2. I tested very quickly and at least one seed that was failing with an OOM now passes. I commented out GC-helpers Mike added to make it even harder (and un-ignored TestDirectPostingsFormat).

        Mike could you take a look and maybe beast it a bit? It's getting late on my side – feel free to commit if everything is all right.

        Show
        Dawid Weiss added a comment - Patch against the trunk updating rr to 2.0.2. I tested very quickly and at least one seed that was failing with an OOM now passes. I commented out GC-helpers Mike added to make it even harder (and un-ignored TestDirectPostingsFormat). Mike could you take a look and maybe beast it a bit? It's getting late on my side – feel free to commit if everything is all right.
        Hide
        Michael McCandless added a comment -

        +1, seems to work great! TestDirectPF -mult 3 -nightly quickly OOMEs on trunk if I comment out the GC helper nullings, but w/ the patch I ran for 24 iters before OOME (this test separately has OOME problems). So this seems like good progress!

        Thanks Dawid, I'll commit!

        Show
        Michael McCandless added a comment - +1, seems to work great! TestDirectPF -mult 3 -nightly quickly OOMEs on trunk if I comment out the GC helper nullings, but w/ the patch I ran for 24 iters before OOME (this test separately has OOME problems). So this seems like good progress! Thanks Dawid, I'll commit!
        Hide
        Michael McCandless added a comment -

        Thanks Dawid!

        Show
        Michael McCandless added a comment - Thanks Dawid!
        Hide
        Dawid Weiss added a comment -

        Thanks Mike. If you're looking into OOMs with YourKit then try to save a differential snapshot - this helps greatly in analysis typically. Also, keep those snapshots if you think something in the runner may be the cause (I have YourKit as well).

        Show
        Dawid Weiss added a comment - Thanks Mike. If you're looking into OOMs with YourKit then try to save a differential snapshot - this helps greatly in analysis typically. Also, keep those snapshots if you think something in the runner may be the cause (I have YourKit as well).
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1392543

        LUCENE-4451: upgrade to randomizedtesting-2.0.2.jar to reduce RAM overhead for tracking new threads

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1392543 LUCENE-4451 : upgrade to randomizedtesting-2.0.2.jar to reduce RAM overhead for tracking new threads
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1392024

        LUCENE-4451: turn threads back on

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1392024 LUCENE-4451 : turn threads back on

          People

          • Assignee:
            Dawid Weiss
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development