Solr
  1. Solr
  2. SOLR-5022

PermGen exhausted test failures on Jenkins.

    Details

    • Type: Test Test
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: Tests
    • Labels:
      None
    1. SOLR-5022.patch
      2 kB
      Uwe Schindler
    2. intern-count-win.txt
      67 kB
      Dawid Weiss
    3. SOLR-5022-permgen.patch
      2 kB
      Uwe Schindler
    4. SOLR-5022-permgen.patch
      1 kB
      Uwe Schindler

      Activity

      Hide
      Mark Miller added a comment -

      Comment from dev list:

      Looks like we currently don't set the max perm gen for tests, so you get the default - I think we want to change that regardless - we don't want it to vary IMO - it should work like Xmx.

      I think we should just set it to 128 mb, and these tests should have plenty of room to run.

      Show
      Mark Miller added a comment - Comment from dev list: Looks like we currently don't set the max perm gen for tests, so you get the default - I think we want to change that regardless - we don't want it to vary IMO - it should work like Xmx. I think we should just set it to 128 mb, and these tests should have plenty of room to run.
      Hide
      Robert Muir added a comment -

      -1 to increasing permgen.

      solr ran fine without it before, I want to know "why something wants" more permgen: and for what? classes, interned strings, what exactly.

      Show
      Robert Muir added a comment - -1 to increasing permgen. solr ran fine without it before, I want to know "why something wants" more permgen: and for what? classes, interned strings, what exactly.
      Hide
      Robert Muir added a comment -

      This isnt the heap where you give things "plenty of room".

      This is a memory leak that should be fixed.

      Show
      Robert Muir added a comment - This isnt the heap where you give things "plenty of room". This is a memory leak that should be fixed.
      Hide
      Uwe Schindler added a comment -

      The problem with raising permgen is:

      • It's Hotspot specific only, so does not work with other JVMs
      • Its no longer available in Java 8

      I would really prefer to maybe tune the tests and maybe not create so many nodes in the cloud tests. It looks like the bug happens more often with higher test multiplier (-Dtests.multiplier=3), so maybe we can really tune that.
      If we want to raise permgen, we have to do it in a similar way like we do enable the heap dumps - with lots of <condition/> tasks in ANT...

      Show
      Uwe Schindler added a comment - The problem with raising permgen is: It's Hotspot specific only, so does not work with other JVMs Its no longer available in Java 8 I would really prefer to maybe tune the tests and maybe not create so many nodes in the cloud tests. It looks like the bug happens more often with higher test multiplier (-Dtests.multiplier=3), so maybe we can really tune that. If we want to raise permgen, we have to do it in a similar way like we do enable the heap dumps - with lots of <condition/> tasks in ANT...
      Hide
      Uwe Schindler added a comment -

      One thing in addition:
      We currently have a assumeFalse() in the hadoop tests that check for windows and freebsd. But the latter, freebsd is bogus, as only the configuration of Jenkins FreeBSD is wrong, not FreeBSD in general (the blackhole must be enabled).

      I would prefer to add a property to ANT "tests.disable.hadoop" that defaults to "true" (on Windows) and "false" elsewhere. In the tests we can make an assume on the existence of this property. Or alternatively put all hadoop tests in a test group that can be disabled (I would prefer the latter, maybe Dawid Weiss can help).

      On FreeBSD jenkins we would set this property to "true", on other jenkins we can autodetect it (windows, or other). And if one does not want to run Hadoop tests at all, he can disable.

      Show
      Uwe Schindler added a comment - One thing in addition: We currently have a assumeFalse() in the hadoop tests that check for windows and freebsd. But the latter, freebsd is bogus, as only the configuration of Jenkins FreeBSD is wrong, not FreeBSD in general (the blackhole must be enabled). I would prefer to add a property to ANT "tests.disable.hadoop" that defaults to "true" (on Windows) and "false" elsewhere. In the tests we can make an assume on the existence of this property. Or alternatively put all hadoop tests in a test group that can be disabled (I would prefer the latter, maybe Dawid Weiss can help). On FreeBSD jenkins we would set this property to "true", on other jenkins we can autodetect it (windows, or other). And if one does not want to run Hadoop tests at all, he can disable.
      Hide
      Dawid Weiss added a comment -

      Or alternatively put all hadoop tests in a test group that can be disabled

      This shouldn't be a problem – create a new test group (an annotation marked with a meta-annotation, see existing code of BadApple for example), enable or disable the test group by default, override via ANT.

      The group would be disabled/enabled via ant's condition and a value passed via system property, much like it is the case with badapple and nightly. There is no way to evaluate a test group's execution status at runtime; an alternative here is to use a before-suite-rule and an assumption in there.

      Show
      Dawid Weiss added a comment - Or alternatively put all hadoop tests in a test group that can be disabled This shouldn't be a problem – create a new test group (an annotation marked with a meta-annotation, see existing code of BadApple for example), enable or disable the test group by default, override via ANT. The group would be disabled/enabled via ant's condition and a value passed via system property, much like it is the case with badapple and nightly. There is no way to evaluate a test group's execution status at runtime; an alternative here is to use a before-suite-rule and an assumption in there.
      Hide
      Uwe Schindler added a comment - - edited

      Maybe I know, why the permgen issues do not happen for all of us! The reason is:

      • Something seems to eat permgen by interning strings! Those interned strings are never freed until the JVM dies.
      • If you run with many CPUs, the test runner runs tests in multiple parallel JVMs, so every JVMs runs less tests.

      ...the Jenkins server on MacOSX runs with one JVM only (because the virtual box has only 2 virtual CPUs). So all tests have to share the permgen. Windows always passes because no hadoop used. And linux fails more seldom (2 parallel JVMs). On FreeBSD we also don't run hadoop tests.

      We have to find out: Something seems to eat all permgen, not by loading classes, but by interning strings. And that’s the issue here. My idea would be: I will run forbidden-apis on all JAR files of Solr that were added and will forbid String#intern() signature. This should show us very fast, who interns strings and we can open bug reports or hot-patch those jar files.

      Show
      Uwe Schindler added a comment - - edited Maybe I know, why the permgen issues do not happen for all of us! The reason is: Something seems to eat permgen by interning strings! Those interned strings are never freed until the JVM dies. If you run with many CPUs, the test runner runs tests in multiple parallel JVMs, so every JVMs runs less tests. ...the Jenkins server on MacOSX runs with one JVM only (because the virtual box has only 2 virtual CPUs). So all tests have to share the permgen. Windows always passes because no hadoop used. And linux fails more seldom (2 parallel JVMs). On FreeBSD we also don't run hadoop tests. We have to find out: Something seems to eat all permgen, not by loading classes, but by interning strings. And that’s the issue here. My idea would be: I will run forbidden-apis on all JAR files of Solr that were added and will forbid String#intern() signature. This should show us very fast, who interns strings and we can open bug reports or hot-patch those jar files.
      Hide
      Mark Miller added a comment -

      solr ran fine without it before,

      It runs fine now as well - requiring more perm gen in tests is not a Solr bug - sorry. Simply saying words don't make things true

      For running the clover target, we set the perm size to 192m - quick, fix that bug! Oh wait, thats a stupid thing to say...

      Show
      Mark Miller added a comment - solr ran fine without it before, It runs fine now as well - requiring more perm gen in tests is not a Solr bug - sorry. Simply saying words don't make things true For running the clover target, we set the perm size to 192m - quick, fix that bug! Oh wait, thats a stupid thing to say...
      Hide
      Mark Miller added a comment -

      It looks like the bug happens more often with higher test multiplier (-Dtests.multiplier=3), so maybe we can really tune that.

      Yes, we could make our tests shittier rather than give them the required resources to run, but thats a pretty silly trade.

      Show
      Mark Miller added a comment - It looks like the bug happens more often with higher test multiplier (-Dtests.multiplier=3), so maybe we can really tune that. Yes, we could make our tests shittier rather than give them the required resources to run, but thats a pretty silly trade.
      Hide
      Mark Miller added a comment -

      Its no longer available in Java 8

      And do you see the problem on Java 8 runs?

      Show
      Mark Miller added a comment - Its no longer available in Java 8 And do you see the problem on Java 8 runs?
      Hide
      Uwe Schindler added a comment -

      Here is a patch not for the permgen issue, but make Jenkins more flexible. A sysprop -Dtests.disableHdfs=true is now supported. It is by default true on Windows.

      The good thing, if you have cygwin, you can enable them now

      I will commit this as a first step to make the Hdfs stuff more flexible. The ASF Jenkins server gets this sysprop hardcoded into the jenkins config (like tests.jettyConnector).

      Show
      Uwe Schindler added a comment - Here is a patch not for the permgen issue, but make Jenkins more flexible. A sysprop -Dtests.disableHdfs=true is now supported. It is by default true on Windows. The good thing, if you have cygwin, you can enable them now I will commit this as a first step to make the Hdfs stuff more flexible. The ASF Jenkins server gets this sysprop hardcoded into the jenkins config (like tests.jettyConnector).
      Hide
      Mark Miller added a comment -

      the Jenkins server on MacOSX runs with one JVM only (because the virtual box has only 2 virtual CPUs). So all tests have to share the permgen. Windows always passes because no hadoop used. And linux fails more seldom (2 parallel JVMs).

      Yes, this would match what I have seen in the wild - on the machines that have fewer cores, I was more likely to see perm gen issues with certain agressive tests. With my 6 core machines where I run with 8 jvms, I have never even remotely seen an issue.

      Show
      Mark Miller added a comment - the Jenkins server on MacOSX runs with one JVM only (because the virtual box has only 2 virtual CPUs). So all tests have to share the permgen. Windows always passes because no hadoop used. And linux fails more seldom (2 parallel JVMs). Yes, this would match what I have seen in the wild - on the machines that have fewer cores, I was more likely to see perm gen issues with certain agressive tests. With my 6 core machines where I run with 8 jvms, I have never even remotely seen an issue.
      Hide
      Uwe Schindler added a comment -

      And do you see the problem on Java 8 runs?

      No, also not on jRockit or IBM J9. But MacOSX only has Java 6 and Java 7 at the moment, so it's not 100% for sure.

      Show
      Uwe Schindler added a comment - And do you see the problem on Java 8 runs? No, also not on jRockit or IBM J9. But MacOSX only has Java 6 and Java 7 at the moment, so it's not 100% for sure.
      Hide
      Uwe Schindler added a comment -

      For running the clover target, we set the perm size to 192m - quick, fix that bug! Oh wait, thats a stupid thing to say...

      Clover only works on Oracle JDKs...

      Show
      Uwe Schindler added a comment - For running the clover target, we set the perm size to 192m - quick, fix that bug! Oh wait, thats a stupid thing to say... Clover only works on Oracle JDKs...
      Hide
      Dawid Weiss added a comment -

      I wouldn't want to argue whether increasing permgen is a good fix or not, but it's an interesting debugging problem on its own. I've just ran Solr tests with an aspect that intercepts intern() calls. I'll post the results here once the tests complete. Let's see what we can get.

      Show
      Dawid Weiss added a comment - I wouldn't want to argue whether increasing permgen is a good fix or not, but it's an interesting debugging problem on its own. I've just ran Solr tests with an aspect that intercepts intern() calls. I'll post the results here once the tests complete. Let's see what we can get.
      Hide
      Uwe Schindler added a comment -

      Thanks Dawid, so I don't need to setup forbidden-apis for that! That was my first idea how to find the places that call intern().

      Show
      Uwe Schindler added a comment - Thanks Dawid, so I don't need to setup forbidden-apis for that! That was my first idea how to find the places that call intern().
      Hide
      Mark Miller added a comment -

      Patch looks good Uwe - +1 on that approach.

      increasing permgen is a good fix or not,

      I would call it a workaround more than a fix - longer term it would be nice to see the root cause addressed - but considering it would seem to involve code in another project, you have to work from a short term and 'possible' long term perspective.

      Show
      Mark Miller added a comment - Patch looks good Uwe - +1 on that approach. increasing permgen is a good fix or not, I would call it a workaround more than a fix - longer term it would be nice to see the root cause addressed - but considering it would seem to involve code in another project, you have to work from a short term and 'possible' long term perspective.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501278 from Uwe Schindler
      [ https://svn.apache.org/r1501278 ]

      SOLR-5022: Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.

      Show
      ASF subversion and git services added a comment - Commit 1501278 from Uwe Schindler [ https://svn.apache.org/r1501278 ] SOLR-5022 : Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501279 from Uwe Schindler
      [ https://svn.apache.org/r1501279 ]

      Merged revision(s) 1501278 from lucene/dev/trunk:
      SOLR-5022: Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.

      Show
      ASF subversion and git services added a comment - Commit 1501279 from Uwe Schindler [ https://svn.apache.org/r1501279 ] Merged revision(s) 1501278 from lucene/dev/trunk: SOLR-5022 : Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501281 from Uwe Schindler
      [ https://svn.apache.org/r1501281 ]

      Merged revision(s) 1501278 from lucene/dev/trunk:
      SOLR-5022: Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.

      Show
      ASF subversion and git services added a comment - Commit 1501281 from Uwe Schindler [ https://svn.apache.org/r1501281 ] Merged revision(s) 1501278 from lucene/dev/trunk: SOLR-5022 : Make it possible to disable HDFS tests on ANT command line (so ASF Jenkins can use it). Windows is disabled by default, too.
      Hide
      Dawid Weiss added a comment -

      This is a count/uniq of a full run from a Windows box. I forgot it won't run Hadoop tests in this mode – will retry on a Mac this evening (preemptive interrupt from kids).

      Show
      Dawid Weiss added a comment - This is a count/uniq of a full run from a Windows box. I forgot it won't run Hadoop tests in this mode – will retry on a Mac this evening (preemptive interrupt from kids).
      Hide
      Dawid Weiss added a comment -

      Eh... those Solr tests run foreeeeever (90 minutes using a single JVM). I ran the code on 4x branch and I honestly don't see anything being interned in Hadoop. It might be interning something indirectly via Java system classes (which are not aspect-woven) but I doubt it.

      The full execution log is here:
      http://www.cs.put.poznan.pl/dweiss/tmp/full.log.gz

      and the interning stats are here (first column is the # of calls, then the origin class and the interned string):
      http://www.cs.put.poznan.pl/dweiss/tmp/log.stats

      A few libraries intern strings heavily:

      org.apache.xmlbeans.*
      org.apache.velocity.*
      

      but a lot of calls comes from Lucene itself:

      org.apache.lucene.codecs.lucene3x.TermBuffer
      org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum
      

      When you look at the stats file overall though, ALL the interned strings shouldn't take more than 1.4M (that's the size of unique strings and additional boilerplate).

      So no luck yet. Unless you can explain what's happening based on that output.

      The next step for me is to log permgen use before/ after each test and see where
      memory is consumed and how. I'll do it tomorrow, perhaps on a different machine (it really takes ages to run those tests).

      Show
      Dawid Weiss added a comment - Eh... those Solr tests run foreeeeever (90 minutes using a single JVM). I ran the code on 4x branch and I honestly don't see anything being interned in Hadoop. It might be interning something indirectly via Java system classes (which are not aspect-woven) but I doubt it. The full execution log is here: http://www.cs.put.poznan.pl/dweiss/tmp/full.log.gz and the interning stats are here (first column is the # of calls, then the origin class and the interned string): http://www.cs.put.poznan.pl/dweiss/tmp/log.stats A few libraries intern strings heavily: org.apache.xmlbeans.* org.apache.velocity.* but a lot of calls comes from Lucene itself: org.apache.lucene.codecs.lucene3x.TermBuffer org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum When you look at the stats file overall though, ALL the interned strings shouldn't take more than 1.4M (that's the size of unique strings and additional boilerplate). So no luck yet. Unless you can explain what's happening based on that output. The next step for me is to log permgen use before/ after each test and see where memory is consumed and how. I'll do it tomorrow, perhaps on a different machine (it really takes ages to run those tests).
      Hide
      Dawid Weiss added a comment -

      One more thing – Uwe, do you remember the seed/ command line to reproduce that permgen error (on a mac?).

      Show
      Dawid Weiss added a comment - One more thing – Uwe, do you remember the seed/ command line to reproduce that permgen error (on a mac?).
      Hide
      Uwe Schindler added a comment - - edited

      Dawid:

      Lucene branch_4x, run http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/618/consoleFull, heapdumps: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/618/artifact/heapdumps/ (I set this build to be sticky, so you can download heapdumps)

      [Lucene-Solr-4.x-MacOSX] $ /bin/sh -xe /var/folders/qg/h2dfw5s161s51l2bn79mrb7r0000gn/T/hudson1681176734157627309.sh
      + echo Using JDK: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC
      Using JDK: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC
      + /Users/jenkins/tools/java/64bit/jdk1.6.0/bin/java -XX:+UseCompressedOops -XX:+UseParallelGC -version
      java version "1.6.0_51"
      Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
      Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode)
      [Lucene-Solr-4.x-MacOSX] $ /Users/jenkins/jenkins-slave/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/bin/ant "-Dargs=-XX:+UseCompressedOops -XX:+UseParallelGC" -Dtests.jvms=1 jenkins-hourly
      

      Master-Seed was: 143E6CCF7E42064B

      Show
      Uwe Schindler added a comment - - edited Dawid: Lucene branch_4x, run http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/618/consoleFull , heapdumps: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/618/artifact/heapdumps/ (I set this build to be sticky, so you can download heapdumps) [Lucene-Solr-4.x-MacOSX] $ /bin/sh -xe /var/folders/qg/h2dfw5s161s51l2bn79mrb7r0000gn/T/hudson1681176734157627309.sh + echo Using JDK: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC Using JDK: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC + /Users/jenkins/tools/java/64bit/jdk1.6.0/bin/java -XX:+UseCompressedOops -XX:+UseParallelGC -version java version "1.6.0_51" Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509) Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode) [Lucene-Solr-4.x-MacOSX] $ /Users/jenkins/jenkins-slave/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/bin/ant "-Dargs=-XX:+UseCompressedOops -XX:+UseParallelGC" -Dtests.jvms=1 jenkins-hourly Master-Seed was: 143E6CCF7E42064B
      Hide
      Uwe Schindler added a comment -

      Lucene 3 interned field names, so the 3.x codec does this, too:

      org.apache.lucene.codecs.lucene3x.TermBuffer
      org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum

      This should be fine.

      When you look at the stats file overall though, ALL the interned strings shouldn't take more than 1.4M (that's the size of unique strings and additional boilerplate).

      Too bad, so i have no idea anymore. No crazy classloaders going amok. No interned strings, what else can fill permgen?

      Show
      Uwe Schindler added a comment - Lucene 3 interned field names, so the 3.x codec does this, too: org.apache.lucene.codecs.lucene3x.TermBuffer org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum This should be fine. When you look at the stats file overall though, ALL the interned strings shouldn't take more than 1.4M (that's the size of unique strings and additional boilerplate). Too bad, so i have no idea anymore. No crazy classloaders going amok. No interned strings, what else can fill permgen?
      Hide
      ASF subversion and git services added a comment -

      Commit 1501595 from Uwe Schindler
      [ https://svn.apache.org/r1501595 ]

      SOLR-5022: Pass-through disableHdfs to Maven Surefire

      Show
      ASF subversion and git services added a comment - Commit 1501595 from Uwe Schindler [ https://svn.apache.org/r1501595 ] SOLR-5022 : Pass-through disableHdfs to Maven Surefire
      Hide
      ASF subversion and git services added a comment -

      Commit 1501596 from Uwe Schindler
      [ https://svn.apache.org/r1501596 ]

      Merged revision(s) 1501595 from lucene/dev/trunk:
      SOLR-5022: Pass-through disableHdfs to Maven Surefire

      Show
      ASF subversion and git services added a comment - Commit 1501596 from Uwe Schindler [ https://svn.apache.org/r1501596 ] Merged revision(s) 1501595 from lucene/dev/trunk: SOLR-5022 : Pass-through disableHdfs to Maven Surefire
      Hide
      ASF subversion and git services added a comment -

      Commit 1501597 from Uwe Schindler
      [ https://svn.apache.org/r1501597 ]

      Merged revision(s) 1501595 from lucene/dev/trunk:
      SOLR-5022: Pass-through disableHdfs to Maven Surefire

      Show
      ASF subversion and git services added a comment - Commit 1501597 from Uwe Schindler [ https://svn.apache.org/r1501597 ] Merged revision(s) 1501595 from lucene/dev/trunk: SOLR-5022 : Pass-through disableHdfs to Maven Surefire
      Hide
      Dawid Weiss added a comment -

      Ok, thanks Uwe. I'll keep digging.

      Show
      Dawid Weiss added a comment - Ok, thanks Uwe. I'll keep digging.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501678 from Uwe Schindler
      [ https://svn.apache.org/r1501678 ]

      SOLR-5022: Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.

      Show
      ASF subversion and git services added a comment - Commit 1501678 from Uwe Schindler [ https://svn.apache.org/r1501678 ] SOLR-5022 : Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.
      Hide
      Dawid Weiss added a comment -

      It's been both fun and a learning experience debugging this. I have good news and bad news:

      • the good news is: it's not a memory leak,
      • the bad news is: it's not a memory leak

      the debugging process

      Clearly permgen is one of the most wicked JVM features - it's damn hard to figure out what its
      content really is (I didn't find a way to dump it from within a running process without invoking
      the debugging interface, which in turn starts its own threads, etc.).

      The way I approached the problem (which may be useful for future reference) is as follows:

      • I wrote a short aspect that injects itself before any String.intern is called:
            pointcut targetMethod(): call(String java.lang.String.intern());
        
            before() : targetMethod()
            {
                final JoinPoint jp = thisJoinPoint;
                System.out.println("String#intern() from: " 
                    + jp.getSourceLocation().getWithinType() + " => "
                    + jp.getTarget());
            }
        
      • then I added a Before and After hook (executed before/after each test) that dumped memory pools:
                System.out.println("Memdump#from: " 
                    + this.getClass().getName() + " => ");
        
                for (MemoryPoolMXBean bean : ManagementFactory.getMemoryPoolMXBeans()) {
                    MemoryUsage usage = bean.getUsage();
                    System.out.println(
                        String.format(Locale.ENGLISH,
                            "%20s - I:%7.1f U:%7.1f M:%7.1f",
                            bean.getName(),
                            usage.getInit() / (1024 * 1024.0d),
                            usage.getUsed() / (1024 * 1024.0d),
                            usage.getMax()  / (1024 * 1024.0d)));
                }
        
      • then I ran solr test in one JVM, with the following parameters:
        ant -Dtests.seed=143E6CCF7E42064B 
            -Dtests.leaveTemporary=true 
            -Dtests.jvms=1 
            -Dargs="-javaagent:aspectjweaver.jar -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+TraceClassLoading"
            test-core
        

        I had to modify common-build.xml to include aspectj classpath entries (and the aspect itself) because
        I couldn't get it to work by passing -cp via the args parameter (didn't look too deeply since it's a hack).

      • I again modified common-build.xml and added:
        sysouts="true" jvmoutputaction="pipe,ignore"
        

        to junit4:junit4 task's attributes so that all output is emitted to temporary files under a build folder.

      the results

      From the dumped output streams we have the following weave info indicating which methods run String.intern:

      $ grep "String.intern(" junit4-J0-20130710_122632_726.syserr
      
      in Type 'com.ctc.wstx.util.SymbolTable'
      in Type 'com.ctc.wstx.util.SymbolTable'
      in Type 'com.ctc.wstx.util.InternCache'
      in Type 'org.apache.solr.response.JSONWriter'
      in Type 'org.apache.lucene.codecs.lucene3x.TermBuffer'
      in Type 'org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum'
      in Type 'org.apache.solr.common.luke.FieldFlag'
      in Type 'org.apache.solr.search.DocSetPerf'
      in Type 'org.joda.time.tz.ZoneInfoProvider'
      in Type 'org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone'
      in Type 'org.joda.time.tz.DateTimeZoneBuilder$Recurrence'
      in Type 'org.joda.time.chrono.GJLocaleSymbols'
      in Type 'org.apache.solr.request.TestWriterPerf'
      

      These indeed intern a lot of strings but they're typically the same so they don't amount to the growth of permgen.
      This in turn is very steady over the runtime of the test JVM:

      $ egrep -o -e "PS Perm Gen[^%]+" junit4-J0-20130710_122632_726.sysout
      
      PS Perm Gen - I:   20.8 U:   15.9 M:   82.0
      PS Perm Gen - I:   20.8 U:   16.1 M:   82.0
      PS Perm Gen - I:   20.8 U:   34.6 M:   82.0
      PS Perm Gen - I:   20.8 U:   37.7 M:   82.0
      PS Perm Gen - I:   20.8 U:   37.7 M:   82.0
      PS Perm Gen - I:   20.8 U:   37.9 M:   82.0
      PS Perm Gen - I:   20.8 U:   37.9 M:   82.0
      PS Perm Gen - I:   20.8 U:   38.0 M:   82.0
      ...
      PS Perm Gen - I:   20.8 U:   77.3 M:   82.0
      PS Perm Gen - I:   20.8 U:   77.4 M:   82.0
      PS Perm Gen - I:   20.8 U:   77.4 M:   82.0
      PS Perm Gen - I:   20.8 U:   77.4 M:   82.0
      PS Perm Gen - I:   20.8 U:   77.4 M:   82.0
      

      I stands for "initial", U for "used", M for "maximum". So you can see that the permgen is nearly-exhausted in this run
      (it didn't OOM though). Out of curiosity I checked for class loading markers – classes are loaded throughout the whole run,
      because each test loads different fragments of the code. So even at the end of the run you get things like:

      Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest => 
                Code Cache - I:    2.4 U:   27.0 M:   48.0
             PS Eden Space - I:   62.9 U:   68.7 M:  167.9
         PS Survivor Space - I:   10.4 U:    0.8 M:    0.8
                PS Old Gen - I:  167.5 U:   97.8 M:  341.4
               PS Perm Gen - I:   20.8 U:   72.7 M:   82.0
      [Loaded org.joda.time.ReadWritableInstant from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      [Loaded org.joda.time.ReadWritableDateTime from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      [Loaded org.joda.time.MutableDateTime from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      [Loaded org.joda.time.field.AbstractReadableInstantFieldProperty from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      [Loaded org.joda.time.MutableDateTime$Property from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      [Loaded org.joda.time.chrono.GJLocaleSymbols from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
      Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest => 
      

      It seems like the problem leading to the permgen is just the huge number of classes being loaded under a single class loader (and these
      classes cannot be unloaded because they're either cross-referenced or something else is holding on to them).

      verifying the class-number hipothesis

      It was interesting to answer the question: how much permgen space would it take to load all these classes without running tests? I wrote
      a small utility that parses the output log with class loading information:

      ...
      [Loaded org.apache.lucene.index.DocTermOrds from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
      [Loaded org.apache.lucene.search.FieldCacheImpl$DocTermOrdsCache from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
      [Loaded org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
      [Loaded org.apache.lucene.search.FieldCache$2 from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/] 
      ...
      

      and turns it into a custom URLClassLoader with the URLs that appear in those entries. Then the tool attempts to load all the referenced classes (and run initializers)
      but does not do anything else. It also dumps the permgen state every 100 classes. The results are as follows:

      # 10240 classes from 61 sources.
          0 -           Code Cache - I:    2.4 U:    0.6 M:   48.0
          0 -        PS Eden Space - I:   62.9 U:   54.1 M: 1319.1
          0 -    PS Survivor Space - I:   10.4 U:    2.6 M:   10.4
          0 -           PS Old Gen - I:  167.5 U:    0.0 M: 2680.0
          0 -          PS Perm Gen - I:   20.8 U:    4.1 M:   82.0
      ...
       1400 -           Code Cache - I:    2.4 U:    0.7 M:   48.0
       1400 -        PS Eden Space - I:   62.9 U:   18.8 M: 1319.1
       1400 -    PS Survivor Space - I:   10.4 U:    3.5 M:   10.4
       1400 -           PS Old Gen - I:  167.5 U:    0.0 M: 2680.0
       1400 -          PS Perm Gen - I:   20.8 U:   12.0 M:   82.0
      ...
       6200 -           Code Cache - I:    2.4 U:    1.3 M:   48.0
       6200 -        PS Eden Space - I:   62.9 U:   33.3 M: 1319.1
       6200 -    PS Survivor Space - I:   10.4 U:   10.4 M:   10.4
       6200 -           PS Old Gen - I:  167.5 U:   10.7 M: 2680.0
       6200 -          PS Perm Gen - I:   20.8 U:   45.6 M:   82.0
      ...
      10239 -           Code Cache - I:    2.4 U:    1.5 M:   48.0
      10239 -        PS Eden Space - I:   62.9 U:    4.8 M: 1319.1
      10239 -    PS Survivor Space - I:   10.4 U:   10.4 M:   10.4
      10239 -           PS Old Gen - I:  167.5 U:   21.7 M: 2680.0
      10239 -          PS Perm Gen - I:   20.8 U:   71.5 M:   82.0
      

      which, if you forgot already, very nicely matches the result acquired from the real test run (classes plus
      interned strings):

      Memdump#from: org.apache.solr.util.FileUtilsTest => 
                Code Cache - I:    2.4 U:   24.3 M:   48.0
             PS Eden Space - I:   62.9 U:   39.6 M:  166.5
         PS Survivor Space - I:   10.4 U:    1.1 M:    2.1
                PS Old Gen - I:  167.5 U:  173.9 M:  341.4
               PS Perm Gen - I:   20.8 U:   77.4 M:   82.0
      

      I repeated the above results with JDK 1.7 (64 bit) and the required permgen space is smaller:

      10239 -           Code Cache - I:    2.4 U:    1.3 M:   48.0
      10239 -        PS Eden Space - I:   62.9 U:   97.5 M: 1319.1
      10239 -    PS Survivor Space - I:   10.4 U:   10.4 M:   10.4
      10239 -           PS Old Gen - I:  167.5 U:   27.5 M: 2680.0
      10239 -          PS Perm Gen - I:   20.8 U:   59.9 M:   82.0
      

      which may be a hint why we're seing the problem only on 1.6 – we're running very close to the limit and 1.6
      is less space-conservative.

      I also ran it with jrockit (for fun):

      10239 -              Nursery - I:   -0.0 U:   13.0 M: 2918.4
      10239 -            Old Space - I:   64.0 U:   62.0 M: 3072.0
      10239 -         Class Memory - I:    0.5 U:   68.7 M:   -0.0
      10239 -    ClassBlock Memory - I:    0.5 U:    4.0 M:   -0.0
      

      and with J9:

      10239 -        class storage - I:    0.0 U:   41.3 M:   -0.0
      10239 -       JIT code cache - I:    0.0 U:    8.0 M:   -0.0
      10239 -       JIT data cache - I:    0.0 U:    0.3 M:   -0.0
      10239 - miscellaneous non-heap storage - I:    0.0 U:    0.0 M:   -0.0
      10239 -            Java heap - I:    4.0 U:   38.3 M:  512.0
      

      conclusions

      So it's the number of classes that is the core of the problem. The workarounds in the order of difficulty:

      • increase max permgen for hotspot (other JVMs should be able to do it dynamically),
      • split solr core tests into multiple ant sub-calls so that they don't run in a single JVM,
      • change the runner to support running tests in isolation (for example max-N tests per JVM, then relaunch)
      • probably a lot more options here, depending on your current creativity levels
      Show
      Dawid Weiss added a comment - It's been both fun and a learning experience debugging this. I have good news and bad news: the good news is: it's not a memory leak, the bad news is: it's not a memory leak the debugging process Clearly permgen is one of the most wicked JVM features - it's damn hard to figure out what its content really is (I didn't find a way to dump it from within a running process without invoking the debugging interface, which in turn starts its own threads, etc.). The way I approached the problem (which may be useful for future reference) is as follows: I wrote a short aspect that injects itself before any String.intern is called: pointcut targetMethod(): call( String java.lang. String .intern()); before() : targetMethod() { final JoinPoint jp = thisJoinPoint; System .out.println( " String #intern() from: " + jp.getSourceLocation().getWithinType() + " => " + jp.getTarget()); } then I added a Before and After hook (executed before/after each test) that dumped memory pools: System .out.println( "Memdump#from: " + this .getClass().getName() + " => " ); for (MemoryPoolMXBean bean : ManagementFactory.getMemoryPoolMXBeans()) { MemoryUsage usage = bean.getUsage(); System .out.println( String .format(Locale.ENGLISH, "%20s - I:%7.1f U:%7.1f M:%7.1f" , bean.getName(), usage.getInit() / (1024 * 1024.0d), usage.getUsed() / (1024 * 1024.0d), usage.getMax() / (1024 * 1024.0d))); } then I ran solr test in one JVM, with the following parameters: ant -Dtests.seed=143E6CCF7E42064B -Dtests.leaveTemporary= true -Dtests.jvms=1 -Dargs= "-javaagent:aspectjweaver.jar -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+TraceClassLoading" test-core I had to modify common-build.xml to include aspectj classpath entries (and the aspect itself) because I couldn't get it to work by passing -cp via the args parameter (didn't look too deeply since it's a hack). I again modified common-build.xml and added: sysouts= " true " jvmoutputaction= "pipe,ignore" to junit4:junit4 task's attributes so that all output is emitted to temporary files under a build folder. the results From the dumped output streams we have the following weave info indicating which methods run String.intern: $ grep " String .intern(" junit4-J0-20130710_122632_726.syserr in Type 'com.ctc.wstx.util.SymbolTable' in Type 'com.ctc.wstx.util.SymbolTable' in Type 'com.ctc.wstx.util.InternCache' in Type 'org.apache.solr.response.JSONWriter' in Type 'org.apache.lucene.codecs.lucene3x.TermBuffer' in Type 'org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum' in Type 'org.apache.solr.common.luke.FieldFlag' in Type 'org.apache.solr.search.DocSetPerf' in Type 'org.joda.time.tz.ZoneInfoProvider' in Type 'org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone' in Type 'org.joda.time.tz.DateTimeZoneBuilder$Recurrence' in Type 'org.joda.time.chrono.GJLocaleSymbols' in Type 'org.apache.solr.request.TestWriterPerf' These indeed intern a lot of strings but they're typically the same so they don't amount to the growth of permgen. This in turn is very steady over the runtime of the test JVM: $ egrep -o -e "PS Perm Gen[^%]+" junit4-J0-20130710_122632_726.sysout PS Perm Gen - I: 20.8 U: 15.9 M: 82.0 PS Perm Gen - I: 20.8 U: 16.1 M: 82.0 PS Perm Gen - I: 20.8 U: 34.6 M: 82.0 PS Perm Gen - I: 20.8 U: 37.7 M: 82.0 PS Perm Gen - I: 20.8 U: 37.7 M: 82.0 PS Perm Gen - I: 20.8 U: 37.9 M: 82.0 PS Perm Gen - I: 20.8 U: 37.9 M: 82.0 PS Perm Gen - I: 20.8 U: 38.0 M: 82.0 ... PS Perm Gen - I: 20.8 U: 77.3 M: 82.0 PS Perm Gen - I: 20.8 U: 77.4 M: 82.0 PS Perm Gen - I: 20.8 U: 77.4 M: 82.0 PS Perm Gen - I: 20.8 U: 77.4 M: 82.0 PS Perm Gen - I: 20.8 U: 77.4 M: 82.0 I stands for "initial", U for "used", M for "maximum". So you can see that the permgen is nearly-exhausted in this run (it didn't OOM though). Out of curiosity I checked for class loading markers – classes are loaded throughout the whole run, because each test loads different fragments of the code. So even at the end of the run you get things like: Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest => Code Cache - I: 2.4 U: 27.0 M: 48.0 PS Eden Space - I: 62.9 U: 68.7 M: 167.9 PS Survivor Space - I: 10.4 U: 0.8 M: 0.8 PS Old Gen - I: 167.5 U: 97.8 M: 341.4 PS Perm Gen - I: 20.8 U: 72.7 M: 82.0 [Loaded org.joda.time.ReadWritableInstant from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] [Loaded org.joda.time.ReadWritableDateTime from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] [Loaded org.joda.time.MutableDateTime from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] [Loaded org.joda.time.field.AbstractReadableInstantFieldProperty from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] [Loaded org.joda.time.MutableDateTime$Property from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] [Loaded org.joda.time.chrono.GJLocaleSymbols from file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar] Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest => It seems like the problem leading to the permgen is just the huge number of classes being loaded under a single class loader (and these classes cannot be unloaded because they're either cross-referenced or something else is holding on to them). verifying the class-number hipothesis It was interesting to answer the question: how much permgen space would it take to load all these classes without running tests ? I wrote a small utility that parses the output log with class loading information: ... [Loaded org.apache.lucene.index.DocTermOrds from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/] [Loaded org.apache.lucene.search.FieldCacheImpl$DocTermOrdsCache from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/] [Loaded org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/] [Loaded org.apache.lucene.search.FieldCache$2 from file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/] ... and turns it into a custom URLClassLoader with the URLs that appear in those entries. Then the tool attempts to load all the referenced classes (and run initializers) but does not do anything else. It also dumps the permgen state every 100 classes. The results are as follows: # 10240 classes from 61 sources. 0 - Code Cache - I: 2.4 U: 0.6 M: 48.0 0 - PS Eden Space - I: 62.9 U: 54.1 M: 1319.1 0 - PS Survivor Space - I: 10.4 U: 2.6 M: 10.4 0 - PS Old Gen - I: 167.5 U: 0.0 M: 2680.0 0 - PS Perm Gen - I: 20.8 U: 4.1 M: 82.0 ... 1400 - Code Cache - I: 2.4 U: 0.7 M: 48.0 1400 - PS Eden Space - I: 62.9 U: 18.8 M: 1319.1 1400 - PS Survivor Space - I: 10.4 U: 3.5 M: 10.4 1400 - PS Old Gen - I: 167.5 U: 0.0 M: 2680.0 1400 - PS Perm Gen - I: 20.8 U: 12.0 M: 82.0 ... 6200 - Code Cache - I: 2.4 U: 1.3 M: 48.0 6200 - PS Eden Space - I: 62.9 U: 33.3 M: 1319.1 6200 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4 6200 - PS Old Gen - I: 167.5 U: 10.7 M: 2680.0 6200 - PS Perm Gen - I: 20.8 U: 45.6 M: 82.0 ... 10239 - Code Cache - I: 2.4 U: 1.5 M: 48.0 10239 - PS Eden Space - I: 62.9 U: 4.8 M: 1319.1 10239 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4 10239 - PS Old Gen - I: 167.5 U: 21.7 M: 2680.0 10239 - PS Perm Gen - I: 20.8 U: 71.5 M: 82.0 which, if you forgot already, very nicely matches the result acquired from the real test run (classes plus interned strings): Memdump#from: org.apache.solr.util.FileUtilsTest => Code Cache - I: 2.4 U: 24.3 M: 48.0 PS Eden Space - I: 62.9 U: 39.6 M: 166.5 PS Survivor Space - I: 10.4 U: 1.1 M: 2.1 PS Old Gen - I: 167.5 U: 173.9 M: 341.4 PS Perm Gen - I: 20.8 U: 77.4 M: 82.0 I repeated the above results with JDK 1.7 (64 bit) and the required permgen space is smaller: 10239 - Code Cache - I: 2.4 U: 1.3 M: 48.0 10239 - PS Eden Space - I: 62.9 U: 97.5 M: 1319.1 10239 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4 10239 - PS Old Gen - I: 167.5 U: 27.5 M: 2680.0 10239 - PS Perm Gen - I: 20.8 U: 59.9 M: 82.0 which may be a hint why we're seing the problem only on 1.6 – we're running very close to the limit and 1.6 is less space-conservative. I also ran it with jrockit (for fun): 10239 - Nursery - I: -0.0 U: 13.0 M: 2918.4 10239 - Old Space - I: 64.0 U: 62.0 M: 3072.0 10239 - Class Memory - I: 0.5 U: 68.7 M: -0.0 10239 - ClassBlock Memory - I: 0.5 U: 4.0 M: -0.0 and with J9: 10239 - class storage - I: 0.0 U: 41.3 M: -0.0 10239 - JIT code cache - I: 0.0 U: 8.0 M: -0.0 10239 - JIT data cache - I: 0.0 U: 0.3 M: -0.0 10239 - miscellaneous non-heap storage - I: 0.0 U: 0.0 M: -0.0 10239 - Java heap - I: 4.0 U: 38.3 M: 512.0 conclusions So it's the number of classes that is the core of the problem. The workarounds in the order of difficulty: increase max permgen for hotspot (other JVMs should be able to do it dynamically), split solr core tests into multiple ant sub-calls so that they don't run in a single JVM, change the runner to support running tests in isolation (for example max-N tests per JVM, then relaunch) probably a lot more options here, depending on your current creativity levels
      Hide
      Dawid Weiss added a comment -

      Full logs are at: http://goo.gl/gzlwk

      Show
      Dawid Weiss added a comment - Full logs are at: http://goo.gl/gzlwk
      Hide
      Uwe Schindler added a comment -

      Thanks Dawid,

      good and bad news. I agree, the best would be to split the tests, but for now the only chance is to raise permgen to 128 MB on Hotspot VMs.

      I will prepare a patch that rauses permgen for Solr tests only, have to think how to combine that in a good way also with the clover special case. I will also check if java 8 still alows the permgen parameter or not. J9 and JRockit are fine.

      Show
      Uwe Schindler added a comment - Thanks Dawid, good and bad news. I agree, the best would be to split the tests, but for now the only chance is to raise permgen to 128 MB on Hotspot VMs. I will prepare a patch that rauses permgen for Solr tests only, have to think how to combine that in a good way also with the clover special case. I will also check if java 8 still alows the permgen parameter or not. J9 and JRockit are fine.
      Hide
      Steve Rowe added a comment -

      Dawid++

      Show
      Steve Rowe added a comment - Dawid++
      Hide
      Jack Krupansky added a comment -

      Is this still looking like a test-only issue, or might users who use Solr intensively with lots of their own add-on plugins hit this same wall, such that things are fine for them with 4.3, but then 4.4 just stops working? Or, do they have an easy workaround by setting PermGen size?

      Show
      Jack Krupansky added a comment - Is this still looking like a test-only issue, or might users who use Solr intensively with lots of their own add-on plugins hit this same wall, such that things are fine for them with 4.3, but then 4.4 just stops working? Or, do they have an easy workaround by setting PermGen size?
      Hide
      Uwe Schindler added a comment -

      I was thinking about the same, would the additional classes make users fail after upgrading, although they don't use Hadoop?

      Show
      Uwe Schindler added a comment - I was thinking about the same, would the additional classes make users fail after upgrading, although they don't use Hadoop?
      Hide
      Dawid Weiss added a comment -

      The easiest way to check would be to expose/inspect permgen stats. For just the number of classes you can run the VM with:

      java ... -XX:+TraceClassLoading | grep "[Loading" | wc -l

      which will give you the number of classes loaded by the VM once it exits. The memory pools can also be checked from within the VM (as in the example above) – we could actually add an assertion to LuceneTestCase that would monitor the use of permgen and throw an assertion if, say, 90% of the maximum permgen space is used. This would prevent permgen-related process-hung-forever issues, at least it'd be an attempt to detect them early.

      Show
      Dawid Weiss added a comment - The easiest way to check would be to expose/inspect permgen stats. For just the number of classes you can run the VM with: java ... -XX:+TraceClassLoading | grep "[Loading" | wc -l which will give you the number of classes loaded by the VM once it exits. The memory pools can also be checked from within the VM (as in the example above) – we could actually add an assertion to LuceneTestCase that would monitor the use of permgen and throw an assertion if, say, 90% of the maximum permgen space is used. This would prevent permgen-related process-hung-forever issues, at least it'd be an attempt to detect them early.
      Hide
      Mark Miller added a comment -

      Thanks Dawid. I owe you some beer for the effort

      hit this same wall

      It's not really a wall - perm gen is configurable for a reason (mainly because oracle impl around perm gen sucks - probably why configuring it is going away like with other jvms). 64 or 92 is just an arbitrary size - I don't consider it a big deal if we have to set it to 128.

      But no, this is not all of a sudden going to start biting you in a non test env because we hit some limit - been running Solr like this with no perm gen bump for months in many different situations and envs. We are adding a lot of classes and what not by running hdfs to test against - which we don't do in production - hdfs runs separately.

      Show
      Mark Miller added a comment - Thanks Dawid. I owe you some beer for the effort hit this same wall It's not really a wall - perm gen is configurable for a reason (mainly because oracle impl around perm gen sucks - probably why configuring it is going away like with other jvms). 64 or 92 is just an arbitrary size - I don't consider it a big deal if we have to set it to 128. But no, this is not all of a sudden going to start biting you in a non test env because we hit some limit - been running Solr like this with no perm gen bump for months in many different situations and envs. We are adding a lot of classes and what not by running hdfs to test against - which we don't do in production - hdfs runs separately.
      Hide
      David Smiley added a comment -

      Thorough investigation Dawid! Wow.

      Show
      David Smiley added a comment - Thorough investigation Dawid! Wow.
      Hide
      Robert Muir added a comment -

      We are adding a lot of classes and what not by running hdfs to test against - which we don't do in production - hdfs runs separately.

      Here is a breakdown of the number of classes in all the .jars of the current war. So I'm confused if it isn't needed in production, can we remove from the war?

      rmuir@beast:~/workspace/lucene-trunk/solr/dist/WEB-INF/lib$ (for file in *.jar; do printf "$file\t" && (unzip -l $file | grep class | wc -l); done) | sort -nrk 2 -
      hadoop-hdfs-2.0.5-alpha.jar	1731
      guava-14.0.1.jar	1594
      hadoop-common-2.0.5-alpha.jar	1392
      lucene-core-5.0-SNAPSHOT.jar	1336
      solr-core-5.0-SNAPSHOT.jar	1252
      lucene-analyzers-common-5.0-SNAPSHOT.jar	450
      zookeeper-3.4.5.jar	437
      org.restlet-2.1.1.jar	398
      httpclient-4.2.3.jar	323
      wstx-asl-3.2.7.jar	251
      lucene-queryparser-5.0-SNAPSHOT.jar	247
      solr-solrj-5.0-SNAPSHOT.jar	235
      joda-time-2.2.jar	229
      protobuf-java-2.4.0a.jar	204
      httpcore-4.2.2.jar	190
      lucene-codecs-5.0-SNAPSHOT.jar	169
      commons-configuration-1.6.jar	165
      lucene-queries-5.0-SNAPSHOT.jar	150
      commons-lang-2.6.jar	133
      commons-io-2.1.jar	104
      commons-codec-1.7.jar	85
      lucene-suggest-5.0-SNAPSHOT.jar	77
      lucene-highlighter-5.0-SNAPSHOT.jar	75
      lucene-grouping-5.0-SNAPSHOT.jar	62
      lucene-spatial-5.0-SNAPSHOT.jar	59
      lucene-misc-5.0-SNAPSHOT.jar	55
      lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar	49
      concurrentlinkedhashmap-lru-1.2.jar	44
      commons-fileupload-1.2.1.jar	43
      spatial4j-0.3.jar	41
      hadoop-auth-2.0.5-alpha.jar	26
      commons-cli-1.2.jar	22
      hadoop-annotations-2.0.5-alpha.jar	17
      httpmime-4.2.3.jar	15
      lucene-memory-5.0-SNAPSHOT.jar	14
      noggit-0.5.jar	11
      org.restlet.ext.servlet-2.1.1.jar	8
      lucene-analyzers-phonetic-5.0-SNAPSHOT.jar	6
      
      Show
      Robert Muir added a comment - We are adding a lot of classes and what not by running hdfs to test against - which we don't do in production - hdfs runs separately. Here is a breakdown of the number of classes in all the .jars of the current war. So I'm confused if it isn't needed in production, can we remove from the war? rmuir@beast:~/workspace/lucene-trunk/solr/dist/WEB-INF/lib$ (for file in *.jar; do printf "$file\t" && (unzip -l $file | grep class | wc -l); done) | sort -nrk 2 - hadoop-hdfs-2.0.5-alpha.jar 1731 guava-14.0.1.jar 1594 hadoop-common-2.0.5-alpha.jar 1392 lucene-core-5.0-SNAPSHOT.jar 1336 solr-core-5.0-SNAPSHOT.jar 1252 lucene-analyzers-common-5.0-SNAPSHOT.jar 450 zookeeper-3.4.5.jar 437 org.restlet-2.1.1.jar 398 httpclient-4.2.3.jar 323 wstx-asl-3.2.7.jar 251 lucene-queryparser-5.0-SNAPSHOT.jar 247 solr-solrj-5.0-SNAPSHOT.jar 235 joda-time-2.2.jar 229 protobuf-java-2.4.0a.jar 204 httpcore-4.2.2.jar 190 lucene-codecs-5.0-SNAPSHOT.jar 169 commons-configuration-1.6.jar 165 lucene-queries-5.0-SNAPSHOT.jar 150 commons-lang-2.6.jar 133 commons-io-2.1.jar 104 commons-codec-1.7.jar 85 lucene-suggest-5.0-SNAPSHOT.jar 77 lucene-highlighter-5.0-SNAPSHOT.jar 75 lucene-grouping-5.0-SNAPSHOT.jar 62 lucene-spatial-5.0-SNAPSHOT.jar 59 lucene-misc-5.0-SNAPSHOT.jar 55 lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar 49 concurrentlinkedhashmap-lru-1.2.jar 44 commons-fileupload-1.2.1.jar 43 spatial4j-0.3.jar 41 hadoop-auth-2.0.5-alpha.jar 26 commons-cli-1.2.jar 22 hadoop-annotations-2.0.5-alpha.jar 17 httpmime-4.2.3.jar 15 lucene-memory-5.0-SNAPSHOT.jar 14 noggit-0.5.jar 11 org.restlet.ext.servlet-2.1.1.jar 8 lucene-analyzers-phonetic-5.0-SNAPSHOT.jar 6
      Hide
      Mark Miller added a comment -

      can we remove from the war?

      No - those are the classes with the client code that we use to talk to hdfs.

      The tests add a variety of other dependencies and test jars and actually start up hdfs.

      Solr and the webapp simply talk to hdfs.

      Show
      Mark Miller added a comment - can we remove from the war? No - those are the classes with the client code that we use to talk to hdfs. The tests add a variety of other dependencies and test jars and actually start up hdfs. Solr and the webapp simply talk to hdfs.
      Hide
      Uwe Schindler added a comment -

      Here is a patch that at least fixes the problem for now. Some notes:

      • JRockit, IBM J9 and Java 8 actually ignore the permgen Java option, so it will not hurt. They just print a warning that it is unused
      • I moved the -Dargs last, because the JVM command line parsing puts the last one overide previous ones. So you can override with -Dargs

      The other options should be investigated later, the current patch should make the problems go away now.

      Show
      Uwe Schindler added a comment - Here is a patch that at least fixes the problem for now. Some notes: JRockit, IBM J9 and Java 8 actually ignore the permgen Java option, so it will not hurt. They just print a warning that it is unused I moved the -Dargs last, because the JVM command line parsing puts the last one overide previous ones. So you can override with -Dargs The other options should be investigated later, the current patch should make the problems go away now.
      Hide
      Uwe Schindler added a comment -

      In general last night I had an idea: We could start the HDFS cluset in a separate JVM for tests that need it!

      Show
      Uwe Schindler added a comment - In general last night I had an idea: We could start the HDFS cluset in a separate JVM for tests that need it!
      Hide
      Robert Muir added a comment -

      The tests add a variety of other dependencies and test jars and actually start up hdfs.

      But these account for ~2000 classes whereas the ones in solr.war account for ~3000 classes.

      I mean this explains why we see the issue, from a test environment this hadoop stuff nearly doubled the number of classes. But the "client code" stuff is heavy too (its still 3000 additional classes).

      Really if hadoop integration+tests were in a contrib module, we probably wouldnt even see the problem, or we could contain it, because its tests would run isolated in their own jvm(s). Maybe we should do that?

      rmuir@beast:~/workspace/lucene-trunk/solr/test-framework/lib$ (for file in *.jar; do printf "$file\t" && (unzip -l $file | grep class | wc -l); done) | sort -nrk 2 -
      ant-1.8.2.jar	1090
      junit4-ant-2.0.10.jar	1038
      hadoop-common-2.0.5-alpha-tests.jar	675
      hadoop-hdfs-2.0.5-alpha-tests.jar	640
      commons-collections-3.2.1.jar	458
      jersey-core-1.16.jar	351
      junit-4.10.jar	252
      jetty-6.1.26.jar	237
      randomizedtesting-runner-2.0.10.jar	142
      jetty-util-6.1.26.jar	105
      
      Show
      Robert Muir added a comment - The tests add a variety of other dependencies and test jars and actually start up hdfs. But these account for ~2000 classes whereas the ones in solr.war account for ~3000 classes. I mean this explains why we see the issue, from a test environment this hadoop stuff nearly doubled the number of classes. But the "client code" stuff is heavy too (its still 3000 additional classes). Really if hadoop integration+tests were in a contrib module, we probably wouldnt even see the problem, or we could contain it, because its tests would run isolated in their own jvm(s). Maybe we should do that? rmuir@beast:~/workspace/lucene-trunk/solr/test-framework/lib$ (for file in *.jar; do printf "$file\t" && (unzip -l $file | grep class | wc -l); done) | sort -nrk 2 - ant-1.8.2.jar 1090 junit4-ant-2.0.10.jar 1038 hadoop-common-2.0.5-alpha-tests.jar 675 hadoop-hdfs-2.0.5-alpha-tests.jar 640 commons-collections-3.2.1.jar 458 jersey-core-1.16.jar 351 junit-4.10.jar 252 jetty-6.1.26.jar 237 randomizedtesting-runner-2.0.10.jar 142 jetty-util-6.1.26.jar 105
      Hide
      Mark Miller added a comment -

      I mean this explains why we see the issue, from a test environment this hadoop stuff nearly doubled the number of classes. But the "client code" stuff is heavy too (its still 3000 additional classes).

      That doesn't really matter - what matter is what classes are loaded.

      Maybe we should do that?

      I discussed in the issue - I don't think it should be a contrib - especially for the reason of a ton of unloaded class files sitting in a jar...

      Show
      Mark Miller added a comment - I mean this explains why we see the issue, from a test environment this hadoop stuff nearly doubled the number of classes. But the "client code" stuff is heavy too (its still 3000 additional classes). That doesn't really matter - what matter is what classes are loaded. Maybe we should do that? I discussed in the issue - I don't think it should be a contrib - especially for the reason of a ton of unloaded class files sitting in a jar...
      Hide
      Robert Muir added a comment -

      That doesn't really matter - what matter is what classes are loaded.

      It matters to me: some developer accidentally leaves in a debugging statement in some core solr class that references a hadoop class, yet tests pass and everything because we've increased MaxPermSize for all tests and nobody knows, then we release and suddenly all users servers are failing in production.

      Thats why i'm against just increasing MaxPermSize for all of solr and saying "well its only for tests and doesnt impact real users".

      Because nothing will test thats actually the case.

      Show
      Robert Muir added a comment - That doesn't really matter - what matter is what classes are loaded. It matters to me: some developer accidentally leaves in a debugging statement in some core solr class that references a hadoop class, yet tests pass and everything because we've increased MaxPermSize for all tests and nobody knows, then we release and suddenly all users servers are failing in production. Thats why i'm against just increasing MaxPermSize for all of solr and saying "well its only for tests and doesnt impact real users". Because nothing will test thats actually the case.
      Hide
      Mark Miller added a comment -

      Already, perm gen size can vary by jvm, OS, version, etc. This is not a clean battle. And your argument is super general.

      If you want a real test for monitoring perm gen usage, make one, but it's silly to count on variable ceilings in the wild to be high enough to avoid that kind of 'bug'

      Show
      Mark Miller added a comment - Already, perm gen size can vary by jvm, OS, version, etc. This is not a clean battle. And your argument is super general. If you want a real test for monitoring perm gen usage, make one, but it's silly to count on variable ceilings in the wild to be high enough to avoid that kind of 'bug'
      Hide
      Robert Muir added a comment -

      My argument is that if this stuff is really optional and the classes are intended not to be loaded unless you use it, that we should just separate it out as a module (doesnt have to be under contrib/) and enforce this with the compiler.

      Lets be honest: whatever special jvm flags are used in solr tests, i'm going to recommend people use the same ones in production. just like recommending to use jetty over tomcat. because thats what its tested with, so i know it should work.

      Show
      Robert Muir added a comment - My argument is that if this stuff is really optional and the classes are intended not to be loaded unless you use it, that we should just separate it out as a module (doesnt have to be under contrib/) and enforce this with the compiler. Lets be honest: whatever special jvm flags are used in solr tests, i'm going to recommend people use the same ones in production. just like recommending to use jetty over tomcat. because thats what its tested with, so i know it should work.
      Hide
      Mark Miller added a comment -

      My argument is that if this stuff is really optional and the classes are intended not to be loaded unless you use it,

      I didn't say it was optional and classes are not intended to be loaded unless you use it - I said that the tests start up hdfs and the client code does not. The idea that someone will 'accidentally' imports a class that does the equivalent class loading of starting up hdfs is absurd (and impossible given the other classes needed to do it are only available in tests).

      Show
      Mark Miller added a comment - My argument is that if this stuff is really optional and the classes are intended not to be loaded unless you use it, I didn't say it was optional and classes are not intended to be loaded unless you use it - I said that the tests start up hdfs and the client code does not. The idea that someone will 'accidentally' imports a class that does the equivalent class loading of starting up hdfs is absurd (and impossible given the other classes needed to do it are only available in tests).
      Hide
      Robert Muir added a comment -

      I dont care what starts up what, I'm talking about raw number of classes:

      • ~ 3000 hadoop classes in the .war (linked to solr-core)
      • ~ 2000 hadoop classes from test-framework (used only by tests).

      So the argument this is a test issue is absurd, given that there are more hadoop classes in non-test code actually.

      Show
      Robert Muir added a comment - I dont care what starts up what, I'm talking about raw number of classes: ~ 3000 hadoop classes in the .war (linked to solr-core) ~ 2000 hadoop classes from test-framework (used only by tests). So the argument this is a test issue is absurd, given that there are more hadoop classes in non-test code actually.
      Hide
      Mark Miller added a comment -

      If you won't accept that the raw number of classes is not the issue, there is not much else to talk about with you.

      So the argument this is a test issue is absurd, given that there are more hadoop classes in non-test code actually.

      That's not true, you just don't understand. Running HDFS loads a lot of those classes. Using the client API's load a lot less of those classes. It's pretty simple...

      Show
      Mark Miller added a comment - If you won't accept that the raw number of classes is not the issue, there is not much else to talk about with you. So the argument this is a test issue is absurd, given that there are more hadoop classes in non-test code actually. That's not true, you just don't understand. Running HDFS loads a lot of those classes. Using the client API's load a lot less of those classes. It's pretty simple...
      Hide
      Robert Muir added a comment -

      Its not that i dont understand, its that there is nothing to prove that to me: if solr-core tests are only passing with -XXsomearg, then thats the only condition i really know that solr-core works with.

      Show
      Robert Muir added a comment - Its not that i dont understand, its that there is nothing to prove that to me: if solr-core tests are only passing with -XXsomearg, then thats the only condition i really know that solr-core works with.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501789 from Uwe Schindler
      [ https://svn.apache.org/r1501789 ]

      Merged revision(s) 1501678 from lucene/dev/trunk:
      SOLR-5022: Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.

      Show
      ASF subversion and git services added a comment - Commit 1501789 from Uwe Schindler [ https://svn.apache.org/r1501789 ] Merged revision(s) 1501678 from lucene/dev/trunk: SOLR-5022 : Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.
      Hide
      ASF subversion and git services added a comment -

      Commit 1501790 from Uwe Schindler
      [ https://svn.apache.org/r1501790 ]

      Merged revision(s) 1501678 from lucene/dev/trunk:
      SOLR-5022: Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.

      Show
      ASF subversion and git services added a comment - Commit 1501790 from Uwe Schindler [ https://svn.apache.org/r1501790 ] Merged revision(s) 1501678 from lucene/dev/trunk: SOLR-5022 : Make the Maven build also automatically populate the tests.disableHdfs property by a build profile. Otherwise the maven build fails by default on Windows.
      Hide
      Uwe Schindler added a comment - - edited

      That's not true, you just don't understand. Running HDFS loads a lot of those classes. Using the client API's load a lot less of those classes. It's pretty simple...

      Can we put the client classes into a smaller JAR? Is there none available in Maven? Then we would add the student-first-year MiniDFSCluster into a separate JAR and run it only with tests. This would not solve the permgen problem, but would make the Solr WAR smaller. It would also ensure, that no core class accidently starts hadoop (in extra slow mode, just joking).

      Ideally, can we start a separate HDFS cluster (empty) in a parallel JVM next to all tests and use that one remotely (as it would be in reality - the cluster would also not run in the same JVM). Means, ant starts an empty HDFS cluster and all tests use it as data store?

      Show
      Uwe Schindler added a comment - - edited That's not true, you just don't understand. Running HDFS loads a lot of those classes. Using the client API's load a lot less of those classes. It's pretty simple... Can we put the client classes into a smaller JAR? Is there none available in Maven? Then we would add the student-first-year MiniDFSCluster into a separate JAR and run it only with tests. This would not solve the permgen problem, but would make the Solr WAR smaller. It would also ensure, that no core class accidently starts hadoop (in extra slow mode, just joking). Ideally, can we start a separate HDFS cluster (empty) in a parallel JVM next to all tests and use that one remotely (as it would be in reality - the cluster would also not run in the same JVM). Means, ant starts an empty HDFS cluster and all tests use it as data store?
      Hide
      Mark Miller added a comment -

      Can we put the client classes into a smaller JAR?

      Simple in theory, hard in practice I think - especially on an ongoing basis.

      Then we would add the student-first-year MiniDFSCluster into a separate JAR and run it only with tests.

      It already is in a separate jar - a test jar that is not part of Solr.

      but would make the Solr WAR smaller.

      Not that noticeably - these jars are already pretty small from a size perspective.

      It would also ensure, that no core class accidently starts hadoop

      I don't think that's a valid concern - you cannot do this easily at all - first, without the dfsminitcluster code from the test jars, good luck to you. Second, without the other test dependencies, hdfs won't start. So this is like saying, someone in the future could write a virus into Solr - we better not run code. The only way you could load all those classes is if you were hell bent on doing it, and even then it would not be easy.

      Show
      Mark Miller added a comment - Can we put the client classes into a smaller JAR? Simple in theory, hard in practice I think - especially on an ongoing basis. Then we would add the student-first-year MiniDFSCluster into a separate JAR and run it only with tests. It already is in a separate jar - a test jar that is not part of Solr. but would make the Solr WAR smaller. Not that noticeably - these jars are already pretty small from a size perspective. It would also ensure, that no core class accidently starts hadoop I don't think that's a valid concern - you cannot do this easily at all - first, without the dfsminitcluster code from the test jars, good luck to you. Second, without the other test dependencies, hdfs won't start. So this is like saying, someone in the future could write a virus into Solr - we better not run code. The only way you could load all those classes is if you were hell bent on doing it, and even then it would not be easy.
      Hide
      Uwe Schindler added a comment -

      OK, Mark. The hadoop-Jars are already small, OK.

      My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests. This would also emulate a more-real-world scenario, because in production you would never ever run the storage cluster in the same JVM...

      How about this? Is it worth a try?

      Show
      Uwe Schindler added a comment - OK, Mark. The hadoop-Jars are already small, OK. My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests. This would also emulate a more-real-world scenario, because in production you would never ever run the storage cluster in the same JVM... How about this? Is it worth a try?
      Hide
      Mark Miller added a comment -

      My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests.

      Yeah, I have not responded yet because I think it's both interesting and scary.

      I think it could work, but it does make some things more difficult. You could no longer run tests in your IDE that work against hdfs easily right? Not without starting up a separate hdfs and pointing the test to it.

      And you end up with other issues to debug that are harder because you have more moving pieces - and logging output is now split....

      I'm not totally against it, but I think it has it's own issues.

      This would also emulate a more-real-world scenario, because in production you would never ever run the storage cluster in the same JVM...

      Yes, we have larger integration tests at cloudera that tests against a real hdfs setup (outside of the unit tests). The mini cluster is what the hadoop tests count on though, so it's a pretty solid way to test when it comes to hadoop.

      Show
      Mark Miller added a comment - My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests. Yeah, I have not responded yet because I think it's both interesting and scary. I think it could work, but it does make some things more difficult. You could no longer run tests in your IDE that work against hdfs easily right? Not without starting up a separate hdfs and pointing the test to it. And you end up with other issues to debug that are harder because you have more moving pieces - and logging output is now split.... I'm not totally against it, but I think it has it's own issues. This would also emulate a more-real-world scenario, because in production you would never ever run the storage cluster in the same JVM... Yes, we have larger integration tests at cloudera that tests against a real hdfs setup (outside of the unit tests). The mini cluster is what the hadoop tests count on though, so it's a pretty solid way to test when it comes to hadoop.
      Hide
      Uwe Schindler added a comment -

      We are here somehow in a deadlock:

      • I have to stop Jenkins from failing all the time. As this seems to not happen soon (Robert does not like the patch raising permgen - same on my side). So I will pass -Dtests.disableHdfs=true to the Policeman Linux and MacOSX jobs - sorry! The other Jenkins servers don't run Hadoop
      • I would (like Robert) prefer to move the Hadoop Directory and all its dependencies to a separate module. I have no idea why this is so hard, but from my current experience there are approx 10 test classes in a separate package. Just svn mv all this stuff (oas.cloud.hdfs.**) to a new module and we are done? WHERE IS THE PROBLEM IN DOING THIS?
      • I would not like to have Hadoop by default installed as most users won't use it. If you want a Hadoop-enabled Solr, install the contrib into your instance's lib folder. This is theonly way to solve this issue.
      • If you want to reuse a test from core, but let it just run on HDFS, import core's tests into the hadoop module, too and subclass the core tests, giving it a separate configuration and start up the MiniDFSCluster.

      I just repeat: I have no problem with Hadoop at all! It's fine to store an index in Hdfs (maybe), although I would prefer to store the index on local disks with MMAP! But this is purely optional so should be in a separate module!

      Show
      Uwe Schindler added a comment - We are here somehow in a deadlock: I have to stop Jenkins from failing all the time. As this seems to not happen soon (Robert does not like the patch raising permgen - same on my side). So I will pass -Dtests.disableHdfs=true to the Policeman Linux and MacOSX jobs - sorry! The other Jenkins servers don't run Hadoop I would (like Robert) prefer to move the Hadoop Directory and all its dependencies to a separate module. I have no idea why this is so hard, but from my current experience there are approx 10 test classes in a separate package. Just svn mv all this stuff (oas.cloud.hdfs.**) to a new module and we are done? WHERE IS THE PROBLEM IN DOING THIS? I would not like to have Hadoop by default installed as most users won't use it. If you want a Hadoop-enabled Solr, install the contrib into your instance's lib folder. This is theonly way to solve this issue. If you want to reuse a test from core, but let it just run on HDFS, import core's tests into the hadoop module, too and subclass the core tests, giving it a separate configuration and start up the MiniDFSCluster. I just repeat: I have no problem with Hadoop at all! It's fine to store an index in Hdfs (maybe), although I would prefer to store the index on local disks with MMAP! But this is purely optional so should be in a separate module!
      Hide
      Uwe Schindler added a comment - - edited

      NOTE: I (temporary) disabled the Hdfs tests on Policeman Jenkins.

      Show
      Uwe Schindler added a comment - - edited NOTE: I (temporary) disabled the Hdfs tests on Policeman Jenkins.
      Hide
      Dawid Weiss added a comment -

      My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests.

      This is when TestNG's BeforeSuite and AfterSuite annotations would be so handy. These are executed before and after all tests so they act like a setup for "all tests", regardless of their number. Very handy for setting up a one-time costly things like a web server or other things. JUnit doesn't have this functionality as far as I know. Perhaps it could be patched at the randomizedtesting's runner level if there's interest (as a non-standard JUnit extension).

      Show
      Dawid Weiss added a comment - My second idea was to run the MiniDFSCluster (which also needs the Jetty 1.6, right?) as a separate Java Process started before the tests and shut down after the tests. This is when TestNG's BeforeSuite and AfterSuite annotations would be so handy. These are executed before and after all tests so they act like a setup for "all tests", regardless of their number. Very handy for setting up a one-time costly things like a web server or other things. JUnit doesn't have this functionality as far as I know. Perhaps it could be patched at the randomizedtesting's runner level if there's interest (as a non-standard JUnit extension).
      Hide
      Uwe Schindler added a comment -

      Updated patch, only applying permgen to Solr Core. This would be a quick hack, although I don't like it (for the reasons explained above).

      Once we moved HDFS to a separate contrib or module, the same logic could be applied there.

      Show
      Uwe Schindler added a comment - Updated patch, only applying permgen to Solr Core. This would be a quick hack, although I don't like it (for the reasons explained above). Once we moved HDFS to a separate contrib or module, the same logic could be applied there.
      Hide
      Uwe Schindler added a comment -

      This is when TestNG's BeforeSuite and AfterSuite annotations would be so handy. These are executed before and after all tests so they act like a setup for "all tests", regardless of their number. Very handy for setting up a one-time costly things like a web server or other things. JUnit doesn't have this functionality as far as I know. Perhaps it could be patched at the randomizedtesting's runner level if there's interest (as a non-standard JUnit extension).

      Could this be done as a separate attribute to the junit4 task with an java interface suplying beforeSuite and afterSuite?

      Show
      Uwe Schindler added a comment - This is when TestNG's BeforeSuite and AfterSuite annotations would be so handy. These are executed before and after all tests so they act like a setup for "all tests", regardless of their number. Very handy for setting up a one-time costly things like a web server or other things. JUnit doesn't have this functionality as far as I know. Perhaps it could be patched at the randomizedtesting's runner level if there's interest (as a non-standard JUnit extension). Could this be done as a separate attribute to the junit4 task with an java interface suplying beforeSuite and afterSuite?
      Hide
      Dawid Weiss added a comment -

      Yeah... not that I think of it it'd be tricky. You'd want to be able to run isolated classes from IDEs like Eclipse so it'd have to be built-in in the RandomizedRunner, not the ant task.

      I think it'd have to be done using method annotations and a common superclass (LuceneTestCase or its equivalent subclass that handles Solr tests). This would ensure you could still run your tests from Eclipse or other IDEs (because the class hierarchy being run would still hold those annotations). Whether or not everything would fall into place is hard to tell (IDEs vary, some trickery would be needed to detect before-after-all-tests moment from within the runner itself).

      Show
      Dawid Weiss added a comment - Yeah... not that I think of it it'd be tricky. You'd want to be able to run isolated classes from IDEs like Eclipse so it'd have to be built-in in the RandomizedRunner, not the ant task. I think it'd have to be done using method annotations and a common superclass (LuceneTestCase or its equivalent subclass that handles Solr tests). This would ensure you could still run your tests from Eclipse or other IDEs (because the class hierarchy being run would still hold those annotations). Whether or not everything would fall into place is hard to tell (IDEs vary, some trickery would be needed to detect before-after-all-tests moment from within the runner itself).
      Hide
      Steve Rowe added a comment -

      Bulk move 4.4 issues to 4.5 and 5.0

      Show
      Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
      Hide
      Uwe Schindler added a comment -

      Move issue to Solr 4.9.

      Show
      Uwe Schindler added a comment - Move issue to Solr 4.9.

        People

        • Assignee:
          Uwe Schindler
          Reporter:
          Mark Miller
        • Votes:
          0 Vote for this issue
          Watchers:
          7 Start watching this issue

          Dates

          • Created:
            Updated:

            Development