Mahout
  1. Mahout
  2. MAHOUT-916

Make Mahout's tests run in parallel

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8
    • Component/s: build

      Description

      Maven now supports parallel execution of tests. We should hook this in to Mahout.

      1. MAHOUT-916.patch
        0.6 kB
        Isabel Drost-Fromm
      2. MAHOUT-916.patch
        47 kB
        Isabel Drost-Fromm
      3. MAHOUT-916.patch
        0.5 kB
        Isabel Drost-Fromm
      4. MAHOUT-916.patch
        0.9 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          worth trying out.

          Show
          Grant Ingersoll added a comment - worth trying out.
          Hide
          Sean Owen added a comment -

          The patch works for me. However it takes just about the same time for me – one hour.

          I see this message on every test:
          2011-12-09 10:02:26.194 java[10996:ad03] Unable to load realm info from SCDynamicStore
          but I think it's unrelated:
          https://issues.apache.org/jira/browse/HADOOP-7489

          I think the config is quite right though. Is anyone seeing a speedup – is it perhaps an artifact of my local environment?

          Show
          Sean Owen added a comment - The patch works for me. However it takes just about the same time for me – one hour. I see this message on every test: 2011-12-09 10:02:26.194 java [10996:ad03] Unable to load realm info from SCDynamicStore but I think it's unrelated: https://issues.apache.org/jira/browse/HADOOP-7489 I think the config is quite right though. Is anyone seeing a speedup – is it perhaps an artifact of my local environment?
          Hide
          Grant Ingersoll added a comment -

          Yeah, not sure it's much savings. One thing I wonder about is the use of "balanced". I think, in theory, the benefit of this comes from running the tests multiple times, as Maven will collect stats about what runs take longer.

          Show
          Grant Ingersoll added a comment - Yeah, not sure it's much savings. One thing I wonder about is the use of "balanced". I think, in theory, the benefit of this comes from running the tests multiple times, as Maven will collect stats about what runs take longer.
          Hide
          Grant Ingersoll added a comment -

          The other weird thing is even when I run this, w/ thread count of 4 per core, it barely touches the CPU. I guess I would expect that it would be pushing it more.

          Show
          Grant Ingersoll added a comment - The other weird thing is even when I run this, w/ thread count of 4 per core, it barely touches the CPU. I guess I would expect that it would be pushing it more.
          Hide
          Grant Ingersoll added a comment -

          On that note, I suppose it's due to the fact that thread count is per forked JVM, so you really don't see much benefit. I wonder, if we could get back to forking only once, if we would then see some real benefit?

          Show
          Grant Ingersoll added a comment - On that note, I suppose it's due to the fact that thread count is per forked JVM, so you really don't see much benefit. I wonder, if we could get back to forking only once, if we would then see some real benefit?
          Hide
          Sean Owen added a comment -

          That is what I see. I am not even sure this is forking 4 JVMs. The "perCoreThreadCount" setting doesn't help either. I think there is just not quite a setting for what we want to do in this ticket.

          Show
          Sean Owen added a comment - That is what I see. I am not even sure this is forking 4 JVMs. The "perCoreThreadCount" setting doesn't help either. I think there is just not quite a setting for what we want to do in this ticket.
          Hide
          Sean Owen added a comment -

          More like "CantFix" – we both never found any way to run in parallel that worked correcty.

          Show
          Sean Owen added a comment - More like "CantFix" – we both never found any way to run in parallel that worked correcty.
          Hide
          Isabel Drost-Fromm added a comment -

          Tried with the settings of our current surefire/junit configuration - works for me now.

          Show
          Isabel Drost-Fromm added a comment - Tried with the settings of our current surefire/junit configuration - works for me now.
          Hide
          Isabel Drost-Fromm added a comment -

          Fix to parent pom to make tests run in parallel - currently with factor 1.5 of cores available.

          Comparing two runs on a machine with four cores shows all cores busy when running tests.

          Timings on a four core machine for a "mvn clean install -o" (after running the build once to get all dependencies into the local repository):

          Before:

          real 18m39.195s
          user 11m7.470s
          sys 1m17.625s

          After:

          real 4m24.783s
          user 10m34.304s
          sys 1m2.660s

          Show
          Isabel Drost-Fromm added a comment - Fix to parent pom to make tests run in parallel - currently with factor 1.5 of cores available. Comparing two runs on a machine with four cores shows all cores busy when running tests. Timings on a four core machine for a "mvn clean install -o" (after running the build once to get all dependencies into the local repository): Before: real 18m39.195s user 11m7.470s sys 1m17.625s After: real 4m24.783s user 10m34.304s sys 1m2.660s
          Hide
          Ted Dunning added a comment -

          That is a big improvement.

          Show
          Ted Dunning added a comment - That is a big improvement.
          Hide
          Ted Dunning added a comment -

          Or not.

          I am running on a 16 core machine and 1.5C parallelism causes 10% of tests to fail.

          Even 4 way parallelism causes errors like this:

          Running org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest
          Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.806 sec <<< FAILURE!
          testCreateNamed(org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest)  Time elapsed: 0.265 sec  <<< ERROR!
          java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 34; columnNumber: 105; XML document structures must start and end within the same entity.
          	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:253)
          	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288)
          	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
          	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1161)
          	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1109)
          	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1045)
          	at org.apache.hadoop.conf.Configuration.get(Configuration.java:397)
          	at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1910)
          	at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:378)
          	at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:150)
          	at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:437)
          	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
          	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:416)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
          	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
          	at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
          	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
          	at org.apache.mahout.vectorizer.SimpleTextEncodingVectorizer.createVectors(SimpleTextEncodingVectorizer.java:63)
          	at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFiles.run(EncodedVectorsFromSequenceFiles.java:99)
          	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          	at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFiles.main(EncodedVectorsFromSequenceFiles.java:37)
          	at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest.runTest(EncodedVectorsFromSequenceFilesTest.java:110)
          	at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest.testCreateNamed(EncodedVectorsFromSequenceFilesTest.java:77)
          
          Running org.apache.mahout.vectorizer.collocations.llr.GramKeyPartitionerTest
          
          Show
          Ted Dunning added a comment - Or not. I am running on a 16 core machine and 1.5C parallelism causes 10% of tests to fail. Even 4 way parallelism causes errors like this: Running org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.806 sec <<< FAILURE! testCreateNamed(org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest) Time elapsed: 0.265 sec <<< ERROR! java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 34; columnNumber: 105; XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:253) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1161) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1109) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1045) at org.apache.hadoop.conf.Configuration.get(Configuration.java:397) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1910) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:378) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:150) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:437) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.mahout.vectorizer.SimpleTextEncodingVectorizer.createVectors(SimpleTextEncodingVectorizer.java:63) at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFiles.run(EncodedVectorsFromSequenceFiles.java:99) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFiles.main(EncodedVectorsFromSequenceFiles.java:37) at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest.runTest(EncodedVectorsFromSequenceFilesTest.java:110) at org.apache.mahout.vectorizer.EncodedVectorsFromSequenceFilesTest.testCreateNamed(EncodedVectorsFromSequenceFilesTest.java:77) Running org.apache.mahout.vectorizer.collocations.llr.GramKeyPartitionerTest
          Hide
          Isabel Drost-Fromm added a comment -

          Seeing that as well now - not sure why I didn't run into the issues when doing the original timing. Looks like some tests are trying to modify the same test data on disk in multiple test cases. Looking closer into the failing tests now.

          Show
          Isabel Drost-Fromm added a comment - Seeing that as well now - not sure why I didn't run into the issues when doing the original timing. Looks like some tests are trying to modify the same test data on disk in multiple test cases. Looking closer into the failing tests now.
          Hide
          Dawid Weiss added a comment -

          I'm not advocating for making the build any more complex but the runner from randomizedtesting (used in Lucene) separates the current working directory for every forked JVM so that their data don't clash with each other. Uwe also came up with a nice idea of placing a security manager on top and restricting that any writes can only happen under that forked directory, not anywhere else.

          The upside of this is that you immediately know if something attempts to write where it's not supposed to. The downside is that the build gets a bit more complex (it's not surefire, it's a different runner). Surefire folks are pretty open to ideas though – perhaps they could add an isolation like that to Surefire.

          Show
          Dawid Weiss added a comment - I'm not advocating for making the build any more complex but the runner from randomizedtesting (used in Lucene) separates the current working directory for every forked JVM so that their data don't clash with each other. Uwe also came up with a nice idea of placing a security manager on top and restricting that any writes can only happen under that forked directory, not anywhere else. The upside of this is that you immediately know if something attempts to write where it's not supposed to. The downside is that the build gets a bit more complex (it's not surefire, it's a different runner). Surefire folks are pretty open to ideas though – perhaps they could add an isolation like that to Surefire.
          Hide
          Isabel Drost-Fromm added a comment -

          Some status in between: Setting "parallel" to class-level only seems to help a bit. What is causing most of the trouble are the tests for Hadoop jobs. They are supposed to write only in a separate temp directory, but at least configuration (de-)serialisation seems to cause issues (currently waiting for the hadoop source to arrive - the conference WiFi isn't quite as good as it was yesterday).

          @Dawid - thanks for the proposal. I'll definitely look into that - at least to figure out where exactly stuff is going wrong.

          Show
          Isabel Drost-Fromm added a comment - Some status in between: Setting "parallel" to class-level only seems to help a bit. What is causing most of the trouble are the tests for Hadoop jobs. They are supposed to write only in a separate temp directory, but at least configuration (de-)serialisation seems to cause issues (currently waiting for the hadoop source to arrive - the conference WiFi isn't quite as good as it was yesterday). @Dawid - thanks for the proposal. I'll definitely look into that - at least to figure out where exactly stuff is going wrong.
          Hide
          Isabel Drost-Fromm added a comment -

          The problem is that some of our tests run in a local Hadoop cluster. Starting such a cluster automatically creates a directory /tmp/hadoop-$USER and stores the job configuration there. In multi threaded test execution this can lead to race conditions of tests trying to write to the same file at the same time.

          To reproduce in a single threaded setting just create /tmp/hadoop-$USER on your machine, issue a "chmod 000 /tmp/hadoop-$USER" and try to re-run the test suite.

          Setting the option "mapred.local.dir" to point to a test specific location avoids part of the issue. In many of our tests this can easily be added to the configuration object we hand over to the job anyway (see updated patch for an idea of what the changes for core might look like).

          There are issues left with some tests executing jobs via the command line interface and some data still being stored in the "/tmp/hadoop-$USER" directory. Will look into that later this week.

          Show
          Isabel Drost-Fromm added a comment - The problem is that some of our tests run in a local Hadoop cluster. Starting such a cluster automatically creates a directory /tmp/hadoop-$USER and stores the job configuration there. In multi threaded test execution this can lead to race conditions of tests trying to write to the same file at the same time. To reproduce in a single threaded setting just create /tmp/hadoop-$USER on your machine, issue a "chmod 000 /tmp/hadoop-$USER" and try to re-run the test suite. Setting the option "mapred.local.dir" to point to a test specific location avoids part of the issue. In many of our tests this can easily be added to the configuration object we hand over to the job anyway (see updated patch for an idea of what the changes for core might look like). There are issues left with some tests executing jobs via the command line interface and some data still being stored in the "/tmp/hadoop-$USER" directory. Will look into that later this week.
          Hide
          Dawid Weiss added a comment -

          You'd have to check but I bet /tmp is based off a system property java.io.tmpdir. For Lucene this is solved by setting:

                      <sysproperty key="java.io.tmpdir" value="." />
          

          again - this works because randomized testing splits the cwd for each forked JVM, but perhaps you could set up a class rule on hadoop tests that would "temporarily"? set up a sub-tmp folder and override this property. Might work.

          Show
          Dawid Weiss added a comment - You'd have to check but I bet /tmp is based off a system property java.io.tmpdir. For Lucene this is solved by setting: <sysproperty key= "java.io.tmpdir" value= "." /> again - this works because randomized testing splits the cwd for each forked JVM, but perhaps you could set up a class rule on hadoop tests that would "temporarily"? set up a sub-tmp folder and override this property. Might work.
          Hide
          Isabel Drost-Fromm added a comment -

          Unfortunately it isn't quite as easy - Hadoop sets that system property to pass configured temporary directories to a job. The variable to set is hadoop.tmp.dir. Currently adjusting our unit tests to pass a configuration down to Hadoop jobs that includes this setting.

          In that process I came across a few drivers that do not pass a configuration object down to sub-jobs spawned (which means that user supplied options will not be taken into account for these sub jobs) - will open a separate issue for these.

          Show
          Isabel Drost-Fromm added a comment - Unfortunately it isn't quite as easy - Hadoop sets that system property to pass configured temporary directories to a job. The variable to set is hadoop.tmp.dir. Currently adjusting our unit tests to pass a configuration down to Hadoop jobs that includes this setting. In that process I came across a few drivers that do not pass a configuration object down to sub-jobs spawned (which means that user supplied options will not be taken into account for these sub jobs) - will open a separate issue for these.
          Hide
          Isabel Drost-Fromm added a comment -

          In order to be able to run tests in parallel we need to become independent of any test global temp directories.

          Show
          Isabel Drost-Fromm added a comment - In order to be able to run tests in parallel we need to become independent of any test global temp directories.
          Hide
          Isabel Drost-Fromm added a comment -

          Updated version, note that you will also need MAHOUT-1200 and MAHOUT-1201 in order to make these changes work without concurrency issues.

          Show
          Isabel Drost-Fromm added a comment - Updated version, note that you will also need MAHOUT-1200 and MAHOUT-1201 in order to make these changes work without concurrency issues.
          Hide
          Isabel Drost-Fromm added a comment -

          As for runtime: At least on my machine (4 cores, ssd) the changes more than half the time needed to run a full build including tests (excluding dependency download of course).

          Show
          Isabel Drost-Fromm added a comment - As for runtime: At least on my machine (4 cores, ssd) the changes more than half the time needed to run a full build including tests (excluding dependency download of course).
          Hide
          Isabel Drost-Fromm added a comment -

          Final comment on that one: Current number of parallel threads is set to 1.5 number of CPUs reported by your machine.

          That means in order to avoid swapping (that is in order to avoid having your tests run slower than when executed sequentially) the max amount of memory for each thread should be configured to be mem of your machine / (1.5 * number of cores) - this can be done in the maven surefire plugin itself. Currently each thread is configured to use 1.8 gigs if I see that correctly. Reduce if you think this is too much for your average developer's machine.

          Show
          Isabel Drost-Fromm added a comment - Final comment on that one: Current number of parallel threads is set to 1.5 number of CPUs reported by your machine. That means in order to avoid swapping (that is in order to avoid having your tests run slower than when executed sequentially) the max amount of memory for each thread should be configured to be mem of your machine / (1.5 * number of cores) - this can be done in the maven surefire plugin itself. Currently each thread is configured to use 1.8 gigs if I see that correctly. Reduce if you think this is too much for your average developer's machine.
          Hide
          Hudson added a comment -

          Integrated in Mahout-Quality #2027 (See https://builds.apache.org/job/Mahout-Quality/2027/)
          MAHOUT-916 - make mahout tests run in parallel

          Currently set to class parallel (method parallel fails with our tests at the
          moment), set to run 1.5 as many test thread as the machine running them as CPUs. (Revision 1488632)

          Result = SUCCESS
          isabel :
          Files :

          • /mahout/trunk/pom.xml
          Show
          Hudson added a comment - Integrated in Mahout-Quality #2027 (See https://builds.apache.org/job/Mahout-Quality/2027/ ) MAHOUT-916 - make mahout tests run in parallel Currently set to class parallel (method parallel fails with our tests at the moment), set to run 1.5 as many test thread as the machine running them as CPUs. (Revision 1488632) Result = SUCCESS isabel : Files : /mahout/trunk/pom.xml
          Hide
          Sean Owen added a comment -

          This makes the tests much slower for me, since on my laptop the several threads' I/O is just contending for the disk head and at 20% CPU only. It also makes my laptop unusable. I'd suggest 1 thread per core at most – or maybe cap at 2-3 threads.

          Show
          Sean Owen added a comment - This makes the tests much slower for me, since on my laptop the several threads' I/O is just contending for the disk head and at 20% CPU only. It also makes my laptop unusable. I'd suggest 1 thread per core at most – or maybe cap at 2-3 threads.
          Hide
          Suneel Marthi added a comment -

          Agree with Sean, that's been my experience too.

          Show
          Suneel Marthi added a comment - Agree with Sean, that's been my experience too.
          Hide
          Grant Ingersoll added a comment -

          Can we parameterize it?

          Show
          Grant Ingersoll added a comment - Can we parameterize it?
          Hide
          Sean Owen added a comment -

          I think the full and intended config here is:

          <forkCount>X</forkCount>
          <threadCount>1</threadCount>
          <perCoreThreadCount>false</perCoreThreadCount>
          <reuseForks>false</reuseForks>
          <parallel>classes</parallel>

          .. and the question is just what X is. This makes sure that a fresh JVM is used for each test class, which is necessary to fully isolate RNGs I believe.

          But even at X=2 I still get a system that is completely pegged on I/O once the Hadoop-related jobs start. I am not sure even the minimal parallelism is usable.
          (I have a 2-core Intel i7 2.66GHz Macbook with 8GB RAM and a 5400rpm drive. It's not a memory issue and it's not a weak machine)

          Does this mirror anyone else experience?

          Show
          Sean Owen added a comment - I think the full and intended config here is: <forkCount>X</forkCount> <threadCount>1</threadCount> <perCoreThreadCount>false</perCoreThreadCount> <reuseForks>false</reuseForks> <parallel>classes</parallel> .. and the question is just what X is. This makes sure that a fresh JVM is used for each test class, which is necessary to fully isolate RNGs I believe. But even at X=2 I still get a system that is completely pegged on I/O once the Hadoop-related jobs start. I am not sure even the minimal parallelism is usable. (I have a 2-core Intel i7 2.66GHz Macbook with 8GB RAM and a 5400rpm drive. It's not a memory issue and it's not a weak machine) Does this mirror anyone else experience?
          Hide
          Dawid Weiss added a comment -

          > 5400rpm drive.

          This is probably the performance killer. If somebody has a (decent) SSD they won't feel it.

          In Lucene there is a concept of "user config" properties where one can override the defaults and adjust them for a particular hardware (lower concurrency of tests, for example). Perhaps something like this would be applicable here – don't know how to implement it in Maven though.

          Show
          Dawid Weiss added a comment - > 5400rpm drive. This is probably the performance killer. If somebody has a (decent) SSD they won't feel it. In Lucene there is a concept of "user config" properties where one can override the defaults and adjust them for a particular hardware (lower concurrency of tests, for example). Perhaps something like this would be applicable here – don't know how to implement it in Maven though.
          Hide
          Grant Ingersoll added a comment -

          They are a lot faster for me and my computer is still usable. I'm on: MBP, 2.2, i7 w/ 16 GB of RAM. So, slower machine, more memory, relatively speaking.

          Show
          Grant Ingersoll added a comment - They are a lot faster for me and my computer is still usable. I'm on: MBP, 2.2, i7 w/ 16 GB of RAM. So, slower machine, more memory, relatively speaking.
          Hide
          Sean Owen added a comment -

          Then, question: is the difference between Suneel and I vs Grant and Isabel an SSD drive? I don't have one. That would certainly explain the difference I think.

          Show
          Sean Owen added a comment - Then, question: is the difference between Suneel and I vs Grant and Isabel an SSD drive? I don't have one. That would certainly explain the difference I think.
          Hide
          Sean Owen added a comment -

          After looking at this again I am pretty certain the ultimate cause is memory. There is a load of swapping going on for me, and while the jobs are generating a lot of I/O, it's swapping that is probably overwhelming the disk. An SSD would certainly do better but it isn't the issue.

          On a 4-core machine this is going to try to run 6 additional JVMs with max heap size of 1.8GB. Figuring 25% Java overhead an leaving a GB for everything else on the system, this needs 14.5GB to run comfortably. That seems pretty high as a default requirement.

          But I appear to be running all these successfully with a max heap size of 768M. That works comfortably with 8GB RAM, no meaningful swapping. How about making that revision to bring the default requirement down?

          In fact, if you change this to use 1 JVM per core (vs 1.5) it would also squeeze onto a 4GB / 4-core machine.

          Show
          Sean Owen added a comment - After looking at this again I am pretty certain the ultimate cause is memory. There is a load of swapping going on for me, and while the jobs are generating a lot of I/O, it's swapping that is probably overwhelming the disk. An SSD would certainly do better but it isn't the issue. On a 4-core machine this is going to try to run 6 additional JVMs with max heap size of 1.8GB. Figuring 25% Java overhead an leaving a GB for everything else on the system, this needs 14.5GB to run comfortably. That seems pretty high as a default requirement. But I appear to be running all these successfully with a max heap size of 768M. That works comfortably with 8GB RAM, no meaningful swapping. How about making that revision to bring the default requirement down? In fact, if you change this to use 1 JVM per core (vs 1.5) it would also squeeze onto a 4GB / 4-core machine.
          Hide
          Grant Ingersoll added a comment -

          I don't have an SSD. Let me re-run things and see how it goes. Maybe I'm just used to Lucene's tests, which usually do peg my CPU (thanks, Dawid, for the insight on tuning them!)

          Show
          Grant Ingersoll added a comment - I don't have an SSD. Let me re-run things and see how it goes. Maybe I'm just used to Lucene's tests, which usually do peg my CPU (thanks, Dawid, for the insight on tuning them!)
          Hide
          Sean Owen added a comment -

          Better still, it works in 512MB heap per JVM for me. Final proposed config is this:

          <threadCount>1</threadCount>
          <perCoreThreadCount>false</perCoreThreadCount>
          <parallel>classes</parallel>
          <argLine>-Xmx512m</argLine>

          Show
          Sean Owen added a comment - Better still, it works in 512MB heap per JVM for me. Final proposed config is this: <threadCount>1</threadCount> <perCoreThreadCount>false</perCoreThreadCount> <parallel>classes</parallel> <argLine>-Xmx512m</argLine>
          Hide
          Grant Ingersoll added a comment -

          Sounds right to me.

          Show
          Grant Ingersoll added a comment - Sounds right to me.
          Hide
          Hudson added a comment -

          Integrated in Mahout-Quality #2040 (See https://builds.apache.org/job/Mahout-Quality/2040/)
          MAHOUT-916 reduce max JVM heap during parallel tests to avoid thrashing from over-committed memory (Revision 1489827)

          Result = SUCCESS
          srowen :
          Files :

          • /mahout/trunk/pom.xml
          Show
          Hudson added a comment - Integrated in Mahout-Quality #2040 (See https://builds.apache.org/job/Mahout-Quality/2040/ ) MAHOUT-916 reduce max JVM heap during parallel tests to avoid thrashing from over-committed memory (Revision 1489827) Result = SUCCESS srowen : Files : /mahout/trunk/pom.xml

            People

            • Assignee:
              Isabel Drost-Fromm
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development