Lucene - Core
  1. Lucene - Core
  2. LUCENE-4252

Detect/Fail tests when they leak RAM in static fields

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: general/test
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We run our junit tests without firing up a JVM each time.

      But some tests initialize lots of stuff in @BeforeClass and don't properly null it out in an @AfterClass, which can cause a subsequent test in the same JVM to OOM, which is difficult to debug.

      Inspiration for this was me committing Mike's cool TestPostingsFormat, which forgot to do this: then we were seeing OOMs in several jenkins runs.

      We should try to detect these leaks in LuceneTestCase with RAMUsageEstimator and fail the test.

      1. LUCENE-4252.patch
        2 kB
        Robert Muir
      2. LUCENE-4252.patch
        2 kB
        Robert Muir
      3. sfi.patch
        7 kB
        Dawid Weiss

        Activity

        Hide
        Robert Muir added a comment -

        just a prototype, seems to work.

        E.g. if i svn up to r1365256, before I committed the fix to TestPostingsFormat (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/test/org/apache/lucene/index/TestPostingsFormat.java?r1=1365257&r2=1365256&pathrev=1365257), then TestPostingsFormat fails like this:

        [junit4:junit4] <JUnit4> says cześć! Master seed: 4A806862EF7FBAA2
        [junit4:junit4] Executing 1 suite with 1 JVM.
        [junit4:junit4] Suite: org.apache.lucene.index.TestPostingsFormat
        [junit4:junit4] OK      0.27s | TestPostingsFormat.test
        [junit4:junit4]    > (@AfterClass output)
        [junit4:junit4]   2> NOTE: test params are: codec=Lucene40: {=MockFixedIntBlock(blockSize=769), vqvfc=MockFixedIntBlock(blockSize=769), ouan=PostingsFormat(name=Memory doPackFST= false), vhhpavin=MockFixedIntBlock(blockSize=769)}, sim=RandomSimilarityProvider(queryNorm=true,coord=true): {}, locale=fi_FI, timezone=SystemV/AST4ADT
        [junit4:junit4]   2> NOTE: Linux 3.2.0-24-generic amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=145473456,total=317390848
        [junit4:junit4]   2> NOTE: All tests run in this JVM: [TestPostingsFormat]
        [junit4:junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestPostingsFormat -Dtests.seed=4A806862EF7FBAA2 -Dtests.slow=true -Dtests.locale=fi_FI -Dtests.timezone=SystemV/AST4ADT -Dtests.file.encoding=ISO-8859-1
        [junit4:junit4]   2> 
        [junit4:junit4] ERROR   0.00s | TestPostingsFormat (suite)
        [junit4:junit4]    > Throwable #1: java.lang.RuntimeException: Please fix the static leaks in your test in a @AfterClass, your test seems to hang on to approximately 125704200 bytes.
        [junit4:junit4]    > 	at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.after(TestRuleSetupAndRestoreClassEnv.java:282)
        
        Show
        Robert Muir added a comment - just a prototype, seems to work. E.g. if i svn up to r1365256, before I committed the fix to TestPostingsFormat ( http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/test/org/apache/lucene/index/TestPostingsFormat.java?r1=1365257&r2=1365256&pathrev=1365257 ), then TestPostingsFormat fails like this: [junit4:junit4] <JUnit4> says cześć! Master seed: 4A806862EF7FBAA2 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] Suite: org.apache.lucene.index.TestPostingsFormat [junit4:junit4] OK 0.27s | TestPostingsFormat.test [junit4:junit4] > (@AfterClass output) [junit4:junit4] 2> NOTE: test params are: codec=Lucene40: {=MockFixedIntBlock(blockSize=769), vqvfc=MockFixedIntBlock(blockSize=769), ouan=PostingsFormat(name=Memory doPackFST= false), vhhpavin=MockFixedIntBlock(blockSize=769)}, sim=RandomSimilarityProvider(queryNorm=true,coord=true): {}, locale=fi_FI, timezone=SystemV/AST4ADT [junit4:junit4] 2> NOTE: Linux 3.2.0-24-generic amd64/Sun Microsystems Inc. 1.6.0_24 (64-bit)/cpus=8,threads=1,free=145473456,total=317390848 [junit4:junit4] 2> NOTE: All tests run in this JVM: [TestPostingsFormat] [junit4:junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestPostingsFormat -Dtests.seed=4A806862EF7FBAA2 -Dtests.slow=true -Dtests.locale=fi_FI -Dtests.timezone=SystemV/AST4ADT -Dtests.file.encoding=ISO-8859-1 [junit4:junit4] 2> [junit4:junit4] ERROR 0.00s | TestPostingsFormat (suite) [junit4:junit4] > Throwable #1: java.lang.RuntimeException: Please fix the static leaks in your test in a @AfterClass, your test seems to hang on to approximately 125704200 bytes. [junit4:junit4] > at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.after(TestRuleSetupAndRestoreClassEnv.java:282)
        Hide
        Robert Muir added a comment -

        I think this patch should probably just count all the statics after the test instead of
        subtracting from beforeClass: we don't care. We might have tests creating huge things in static {}
        which it won't catch today.

        Show
        Robert Muir added a comment - I think this patch should probably just count all the statics after the test instead of subtracting from beforeClass: we don't care. We might have tests creating huge things in static {} which it won't catch today.
        Hide
        Uwe Schindler added a comment -

        Cool thing! +1 to add this check

        Show
        Uwe Schindler added a comment - Cool thing! +1 to add this check
        Hide
        Dawid Weiss added a comment -

        Static initializers will be very difficult because they're executed before the actual test begins (a side effect of JUnit architecture). You could roll out each test in a separate class loader to overcome this but it'd create large overhead on memory (class GCs, reinitializations, etc.).

        I like this nonetheless - it should catch a lot of common use case scenarios. Things like initializing static variables in other classes (or thread locals) will remain a problem but we can probably deal with these later on somehow.

        The only thing is that I'd rather move it to a separate class (and add it to the class rule chain). While it may seem like adding unnecessary complexity putting all the checks in a single rule will eventually lead to a tangled blob that is hard to read and analyze.

        I may do it when I come back, Robert – assign to me or create a new issue.

        Show
        Dawid Weiss added a comment - Static initializers will be very difficult because they're executed before the actual test begins (a side effect of JUnit architecture). You could roll out each test in a separate class loader to overcome this but it'd create large overhead on memory (class GCs, reinitializations, etc.). I like this nonetheless - it should catch a lot of common use case scenarios. Things like initializing static variables in other classes (or thread locals) will remain a problem but we can probably deal with these later on somehow. The only thing is that I'd rather move it to a separate class (and add it to the class rule chain). While it may seem like adding unnecessary complexity putting all the checks in a single rule will eventually lead to a tangled blob that is hard to read and analyze. I may do it when I come back, Robert – assign to me or create a new issue.
        Hide
        Michael McCandless added a comment -

        +1, thanks Robert!

        Show
        Michael McCandless added a comment - +1, thanks Robert!
        Hide
        Robert Muir added a comment -

        I already set this threshold as low as 512KB and was fixing the minor leaks last night...
        lemme update the patch (still at 10MB) with my idea to catch static initializers.

        Show
        Robert Muir added a comment - I already set this threshold as low as 512KB and was fixing the minor leaks last night... lemme update the patch (still at 10MB) with my idea to catch static initializers.
        Hide
        Robert Muir added a comment -

        The only thing is that I'd rather move it to a separate class (and add it to the class rule chain). While it may seem like adding unnecessary complexity putting all the checks in a single rule will eventually lead to a tangled blob that is hard to read and analyze.

        We could do this, I didnt want to make all test stacktraces longer though

        Show
        Robert Muir added a comment - The only thing is that I'd rather move it to a separate class (and add it to the class rule chain). While it may seem like adding unnecessary complexity putting all the checks in a single rule will eventually lead to a tangled blob that is hard to read and analyze. We could do this, I didnt want to make all test stacktraces longer though
        Hide
        Robert Muir added a comment -

        heres the update patch, it just doesnt do the part before, and looks at the static fields at the end.

        we dont care how they were initialized, we dont want them hogging up RAM!

        Show
        Robert Muir added a comment - heres the update patch, it just doesnt do the part before, and looks at the static fields at the end. we dont care how they were initialized, we dont want them hogging up RAM!
        Hide
        Robert Muir added a comment -

        assigning to Dawid when he gets back.

        its not a 100% check and not even a total must we commit it: but i was reviewing this stuff manually before so its saves time.

        Show
        Robert Muir added a comment - assigning to Dawid when he gets back. its not a 100% check and not even a total must we commit it: but i was reviewing this stuff manually before so its saves time.
        Hide
        Dawid Weiss added a comment -

        I didnt want to make all test stacktraces longer though

        I honestly don't think this is such a big issue – stack traces are for inspecting what went wrong; if something did you look at it top-to-bottom anyway. Some people are uncomfortable seing a long stack trace (or a nested one); we can apply some filtering to them but I'd say let's just keep them verbose. Better show more than needed than hide what's important.

        with my idea to catch static initializers.

        I looked at the patch but I didn't see where this idea went; perhaps I missed something. Anyway, not important for now, I'll revisit after I'm back home.

        Show
        Dawid Weiss added a comment - I didnt want to make all test stacktraces longer though I honestly don't think this is such a big issue – stack traces are for inspecting what went wrong; if something did you look at it top-to-bottom anyway. Some people are uncomfortable seing a long stack trace (or a nested one); we can apply some filtering to them but I'd say let's just keep them verbose. Better show more than needed than hide what's important. with my idea to catch static initializers. I looked at the patch but I didn't see where this idea went; perhaps I missed something. Anyway, not important for now, I'll revisit after I'm back home.
        Hide
        Robert Muir added a comment -

        I looked at the patch but I didn't see where this idea went; perhaps I missed something. Anyway, not important for now, I'll revisit after I'm back home.

        Well unlike the first patch, it does no subtraction, it just counts the ram usage estimation of all static fields. if they are null, this is of course 0. we don't really care where they were created.

        so now if i add this line to TestDemo:

        static long huge[] = new long[2 * 1024 * 1024];
        

        It fails like this:

        [junit4:junit4] ERROR   0.00s | TestDemo (suite)
        [junit4:junit4]    > Throwable #1: java.lang.RuntimeException: Please fix the static leaks in your test in a @AfterClass, your test seems to hang on to approximately 16777232 bytes.
        [junit4:junit4]    > 	at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.after(TestRuleSetupAndRestoreClassEnv.java:253)
        
        Show
        Robert Muir added a comment - I looked at the patch but I didn't see where this idea went; perhaps I missed something. Anyway, not important for now, I'll revisit after I'm back home. Well unlike the first patch, it does no subtraction, it just counts the ram usage estimation of all static fields. if they are null, this is of course 0. we don't really care where they were created. so now if i add this line to TestDemo: static long huge[] = new long [2 * 1024 * 1024]; It fails like this: [junit4:junit4] ERROR 0.00s | TestDemo (suite) [junit4:junit4] > Throwable #1: java.lang.RuntimeException: Please fix the static leaks in your test in a @AfterClass, your test seems to hang on to approximately 16777232 bytes. [junit4:junit4] > at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.after(TestRuleSetupAndRestoreClassEnv.java:253)
        Hide
        Dawid Weiss added a comment -

        Yep, looks good to me and makes sense. When I was thinking about static initializers I had in mind twisted scenarios like:

        static {
          KlassB.staticField = new byte [gazillion];
        }
        

        but this will be nearly impossible to detect given that you can fan out from a static initializer to other classes etc. Doesn't make sense to try to make it super complex.

        What we could do though is to make it not only a lint rule but also an additional cleanup rule, similar to system property invariant guard and cleanup. We could then add this automatic static field cleanup (on the suite class) to the set of class rules on LuceneTestCase. Perhaps it'd make it cleaner compared to manual cleanups done in @AfterClass or similar hooks?

        Show
        Dawid Weiss added a comment - Yep, looks good to me and makes sense. When I was thinking about static initializers I had in mind twisted scenarios like: static { KlassB.staticField = new byte [gazillion]; } but this will be nearly impossible to detect given that you can fan out from a static initializer to other classes etc. Doesn't make sense to try to make it super complex. What we could do though is to make it not only a lint rule but also an additional cleanup rule, similar to system property invariant guard and cleanup. We could then add this automatic static field cleanup (on the suite class) to the set of class rules on LuceneTestCase. Perhaps it'd make it cleaner compared to manual cleanups done in @AfterClass or similar hooks?
        Hide
        Dawid Weiss added a comment - - edited

        This is useful in general. I've added this to randomized testing package, modifying Robert's code slightly to:

        • exclude static final fields (typically constants that cannot be erased),
        • exclude fields of primitive types,
        • account for superclasses of the suite class (optional but turned on by default),
        • include sorted RAM usage info for accounted fields in the assertion thrown.

        Exclusion rules are customizable via rule subclassing (accept(Field) method).

        Show
        Dawid Weiss added a comment - - edited This is useful in general. I've added this to randomized testing package, modifying Robert's code slightly to: exclude static final fields (typically constants that cannot be erased), exclude fields of primitive types, account for superclasses of the suite class (optional but turned on by default), include sorted RAM usage info for accounted fields in the assertion thrown. Exclusion rules are customizable via rule subclassing (accept(Field) method).
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Dawid Weiss
        http://svn.apache.org/viewvc?view=revision&revision=1383842

        LUCENE-4252: Detect/Fail tests when they leak RAM in static fields.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Dawid Weiss http://svn.apache.org/viewvc?view=revision&revision=1383842 LUCENE-4252 : Detect/Fail tests when they leak RAM in static fields.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Dawid Weiss
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development