Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.1
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The Lucene tests can be parallelized to make for a faster testing system.

      This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html

      Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669

      Notes from Mike M.:

      I'd love to see a clean solution here (the tests are embarrassingly
      parallelizable, and we all have machines with good concurrency these
      days)... I have a rather hacked up solution now, that uses
      "-Dtestpackage=XXX" to split the tests up.

      Ideally I would be able to say "use N threads" and it'd do the right
      thing... like the -j flag to make.

      1. LUCENE-1709-2.patch
        1 kB
        Shai Erera
      2. LUCENE-1709.patch
        23 kB
        Robert Muir
      3. LUCENE-1709.patch
        19 kB
        Robert Muir
      4. LUCENE-1709.patch
        19 kB
        Robert Muir
      5. LUCENE-1709.patch
        15 kB
        Robert Muir
      6. LUCENE-1709.patch
        10 kB
        Robert Muir
      7. LUCENE-1709.patch
        10 kB
        Robert Muir
      8. runLuceneTests.py
        41 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Jason Rutherglen added a comment -

          Given my limited understanding of ANT, it seems like it would be
          ideal long term to be able to set a the numThreads on JUnitTask
          which underneath executes the tests in BatchTest in parallel.

          I'm not sure how to hack Parallel and JUnitTask together in the
          ANT XML.

          If it's possible for JUnit to somehow return Tasks up the chain
          to Parallel, that might work?

          Show
          Jason Rutherglen added a comment - Given my limited understanding of ANT, it seems like it would be ideal long term to be able to set a the numThreads on JUnitTask which underneath executes the tests in BatchTest in parallel. I'm not sure how to hack Parallel and JUnitTask together in the ANT XML. If it's possible for JUnit to somehow return Tasks up the chain to Parallel, that might work?
          Hide
          Jason Rutherglen added a comment -

          So am thinking we'd create a ParallelJUnitTask that extends JUnitTask and accepts a threadCount property. We can reuse code from the threading code from Parallel.

          Show
          Jason Rutherglen added a comment - So am thinking we'd create a ParallelJUnitTask that extends JUnitTask and accepts a threadCount property. We can reuse code from the threading code from Parallel.
          Hide
          Hoss Man added a comment -

          if you're looking to extend ant it might be better to tackle that on the dev@ant list ... there's only two or three ant hackers i can think of in the lucene community (and a lot in the ant community)

          I would also suggest becoming familiar with these two links...
          http://www.nabble.com/1.7.1---Beta-Vote-to16148645.html#a15698118
          https://parallel-junit.dev.java.net/

          As mentioned in the "Previous discussion" link: a simple way for lucene to get some paralllelization using existing ant functionality would be to extend (or make a variant) of our contrib-crawl so it could run the contrib tests in parallel (contrib crawl can't be parallelized in the general case because some contribs have dependencies on other contribs and build them if they aren't already built)

          Show
          Hoss Man added a comment - if you're looking to extend ant it might be better to tackle that on the dev@ant list ... there's only two or three ant hackers i can think of in the lucene community (and a lot in the ant community) I would also suggest becoming familiar with these two links... http://www.nabble.com/1.7.1---Beta-Vote-to16148645.html#a15698118 https://parallel-junit.dev.java.net/ As mentioned in the "Previous discussion" link: a simple way for lucene to get some paralllelization using existing ant functionality would be to extend (or make a variant) of our contrib-crawl so it could run the contrib tests in parallel (contrib crawl can't be parallelized in the general case because some contribs have dependencies on other contribs and build them if they aren't already built)
          Hide
          Jason Rutherglen added a comment -

          Thanks for the links!

          Show
          Jason Rutherglen added a comment - Thanks for the links!
          Hide
          Jason Rutherglen added a comment -

          As mentioned in the "Previous discussion" link: a simple
          way for lucene to get some paralllelization using existing ant
          functionality would be to extend (or make a variant) of our
          contrib-crawl so it could run the contrib tests in parallel
          (contrib crawl can't be parallelized in the general case because
          some contribs have dependencies on other contribs and build them
          if they aren't already built)

          I'd like to get test-core multithreaded. In looking at the
          ant-junit code, it's unfortunately not easy (as noted in the
          links) to execute the batchtest(s) in parallel. So agreed that
          the most obvious solution belongs in the the ant world. However
          I still hold out some hope for a custom interim solution.

          Show
          Jason Rutherglen added a comment - As mentioned in the "Previous discussion" link: a simple way for lucene to get some paralllelization using existing ant functionality would be to extend (or make a variant) of our contrib-crawl so it could run the contrib tests in parallel (contrib crawl can't be parallelized in the general case because some contribs have dependencies on other contribs and build them if they aren't already built) I'd like to get test-core multithreaded. In looking at the ant-junit code, it's unfortunately not easy (as noted in the links) to execute the batchtest(s) in parallel. So agreed that the most obvious solution belongs in the the ant world. However I still hold out some hope for a custom interim solution.
          Hide
          Johan Kindgren added a comment -

          Just a thought from an outsider...
          Do you mean that you would implement your own task? In which module would you put this custom ant task? This would mean that you need a module in Lucene just to be able to build Lucene?

          Can't help keep on pointing to maven which has already solved this problem...
          http://maven.apache.org/plugins/maven-surefire-plugin/examples/testng.html
          Look at running tests in parallel.
          I havent tried this yet, but why spend time reinventing the wheel?

          Show
          Johan Kindgren added a comment - Just a thought from an outsider... Do you mean that you would implement your own task? In which module would you put this custom ant task? This would mean that you need a module in Lucene just to be able to build Lucene? Can't help keep on pointing to maven which has already solved this problem... http://maven.apache.org/plugins/maven-surefire-plugin/examples/testng.html Look at running tests in parallel. I havent tried this yet, but why spend time reinventing the wheel?
          Hide
          Jason Rutherglen added a comment -

          Do you mean that you would implement your own task? In
          which module would you put this custom ant task?

          Yes, seems creating a custom task should work? In contrib/ant?

          Last time I tried to use Maven I couldn't get it to work (was
          running into a bug) so my knowledge isn't very good. We're using
          ant and ivy for our webapp dev. I don't know what Lucene's
          position is on Maven, but am interested pursuing whatever makes
          sense.

          Show
          Jason Rutherglen added a comment - Do you mean that you would implement your own task? In which module would you put this custom ant task? Yes, seems creating a custom task should work? In contrib/ant? Last time I tried to use Maven I couldn't get it to work (was running into a bug) so my knowledge isn't very good. We're using ant and ivy for our webapp dev. I don't know what Lucene's position is on Maven, but am interested pursuing whatever makes sense.
          Hide
          Johan Kindgren added a comment -

          I did a quick maven setup just to see the effects of running tests in parallel, and the result was a bit surprising. When running in parallel, it took about 30-40 seconds less (~7.40minutes) than running in sequence (~8.10minutes).
          Don't know if there would be a significant boost on a quad-core, my dual core had some idle time left when running with five threads.
          Is it worth the development time and the complexity increase of the build?

          Show
          Johan Kindgren added a comment - I did a quick maven setup just to see the effects of running tests in parallel, and the result was a bit surprising. When running in parallel, it took about 30-40 seconds less (~7.40minutes) than running in sequence (~8.10minutes). Don't know if there would be a significant boost on a quad-core, my dual core had some idle time left when running with five threads. Is it worth the development time and the complexity increase of the build?
          Hide
          Michael McCandless added a comment -

          Actually I see decent gains from concurrency: when I run tests with 6
          threads my tests run a little over 3X faster (12:59 with 1 thread and
          4:15 with 6 threads).

          I'm using a Python script that launches the threads, each specifying
          -Dtestpackage to run a certain subset of Lucene's tests.

          This is on an OpenSolaris (2009.06) machine, with a Core i7 920 CPU
          (= 8 cores presented to the OS) and an Intel X25M SSD, 12 GB RAM. The
          hardware has quite a bit of concurrency.

          Show
          Michael McCandless added a comment - Actually I see decent gains from concurrency: when I run tests with 6 threads my tests run a little over 3X faster (12:59 with 1 thread and 4:15 with 6 threads). I'm using a Python script that launches the threads, each specifying -Dtestpackage to run a certain subset of Lucene's tests. This is on an OpenSolaris (2009.06) machine, with a Core i7 920 CPU (= 8 cores presented to the OS) and an Intel X25M SSD, 12 GB RAM. The hardware has quite a bit of concurrency.
          Hide
          Michael McCandless added a comment -

          I'm attaching the scary python script that I use to use multiple
          threads when running the tests.

          It's not really generic at all. It's got hardwired paths to my home
          dir, it symlinks build/test to /tmp (= tmpfs on my solaris box), it
          applies a scary patch to the build xml files (and that patch is
          depending on the branch – only works on 2.9, 3.0, trunk, flex now).

          Often the patch fails to apply (as we change the build xml files) so I
          have to go and redo them.

          Sometimes tests have false positive failure because they temporarily
          fill up the tmpfs; if you ctrl+C the test it may leave turd processes
          running; etc.

          So I wouldn't recommend using this as is.... but maybe someone who is
          good-with-the-ant can take the general idea here and make it work more
          generally with only ant.

          Roughly all that I do is have each thread run its own -Dtestpackage.
          The existing search & index packages are too big, so I split them into
          2. I also roughly ordered the tests so that they are "balanced", so
          that as each thread pulls from the work queue, the "roughly" fininish
          at the same time. The test-contrib is run by a single thread.

          With all the speedups, this is what the output looks like:

          [TRUNK]
          0 [0:00:00.310867]: run "ant compile-backwards compile-core compile-demo jar-core compile-test build-contrib"...
          0 [0:00:30.612062]: run "ant test-contrib"...
          1 [0:00:30.617522]: run "ant test-core -Dtestpackagea=index"...
          2 [0:00:30.623840]: run "ant test-core -Dtestpackageb=index"...
          3 [0:00:30.630072]: run "ant test-backwards -Dtestpackagea=index"...
          4 [0:00:30.637612]: run "ant test-backwards -Dtestpackageb=index"...
          5 [0:00:30.645334]: run "ant test-core -Dtestpackagea=search"...
          4 [0:00:58.937803]: run "ant test-core -Dtestpackageb=search"...
          2 [0:00:59.052593]: run "ant test-backwards -Dtestpackagea=search"...
          5 [0:01:31.313156]: run "ant test-backwards -Dtestpackageb=search"...
          2 [0:01:52.617814]: run "ant test-core -Dtestpackage=store"...
          4 [0:02:01.698477]: run "ant test-backwards -Dtestpackage=store"...
          2 [0:02:10.818047]: run "ant test-core -Dtestpackage=analysis"...
          3 [0:02:14.808217]: run "ant test-backwards -Dtestpackage=analysis"...
          1 [0:02:15.786972]: run "ant test-core -Dtestpackageroot=lucene"...
          4 [0:02:21.937936]: run "ant test-backwards -Dtestpackageroot=lucene"...
          1 [0:02:25.898000]: run "ant test-core -Dtestpackage=util"...
          2 [0:02:31.037923]: run "ant test-backwards -Dtestpackage=util"...
          4 [0:02:31.038090]: run "ant test-core -Dtestpackage=document"...
          3 [0:02:32.007975]: run "ant test-backwards -Dtestpackage=document"...
          5 [0:02:38.017968]: run "ant test-core -Dtestpackage=queryParser"...
          3 [0:02:40.097929]: run "ant test-backwards -Dtestpackage=queryParser"...
          4 [0:02:40.151166]: DONE
          0 [0:02:41.927786]: DONE
          1 [0:02:43.077980]: DONE
          2 [0:02:46.198287]: DONE
          5 [0:02:49.168172]: DONE
          3 [0:02:50.197936]: DONE
          
          DONE: took 0:02:51.046643 [528 tests]
          
          Show
          Michael McCandless added a comment - I'm attaching the scary python script that I use to use multiple threads when running the tests. It's not really generic at all. It's got hardwired paths to my home dir, it symlinks build/test to /tmp (= tmpfs on my solaris box), it applies a scary patch to the build xml files (and that patch is depending on the branch – only works on 2.9, 3.0, trunk, flex now). Often the patch fails to apply (as we change the build xml files) so I have to go and redo them. Sometimes tests have false positive failure because they temporarily fill up the tmpfs; if you ctrl+C the test it may leave turd processes running; etc. So I wouldn't recommend using this as is.... but maybe someone who is good-with-the-ant can take the general idea here and make it work more generally with only ant. Roughly all that I do is have each thread run its own -Dtestpackage. The existing search & index packages are too big, so I split them into 2. I also roughly ordered the tests so that they are "balanced", so that as each thread pulls from the work queue, the "roughly" fininish at the same time. The test-contrib is run by a single thread. With all the speedups, this is what the output looks like: [TRUNK] 0 [0:00:00.310867]: run "ant compile-backwards compile-core compile-demo jar-core compile-test build-contrib"... 0 [0:00:30.612062]: run "ant test-contrib"... 1 [0:00:30.617522]: run "ant test-core -Dtestpackagea=index"... 2 [0:00:30.623840]: run "ant test-core -Dtestpackageb=index"... 3 [0:00:30.630072]: run "ant test-backwards -Dtestpackagea=index"... 4 [0:00:30.637612]: run "ant test-backwards -Dtestpackageb=index"... 5 [0:00:30.645334]: run "ant test-core -Dtestpackagea=search"... 4 [0:00:58.937803]: run "ant test-core -Dtestpackageb=search"... 2 [0:00:59.052593]: run "ant test-backwards -Dtestpackagea=search"... 5 [0:01:31.313156]: run "ant test-backwards -Dtestpackageb=search"... 2 [0:01:52.617814]: run "ant test-core -Dtestpackage=store"... 4 [0:02:01.698477]: run "ant test-backwards -Dtestpackage=store"... 2 [0:02:10.818047]: run "ant test-core -Dtestpackage=analysis"... 3 [0:02:14.808217]: run "ant test-backwards -Dtestpackage=analysis"... 1 [0:02:15.786972]: run "ant test-core -Dtestpackageroot=lucene"... 4 [0:02:21.937936]: run "ant test-backwards -Dtestpackageroot=lucene"... 1 [0:02:25.898000]: run "ant test-core -Dtestpackage=util"... 2 [0:02:31.037923]: run "ant test-backwards -Dtestpackage=util"... 4 [0:02:31.038090]: run "ant test-core -Dtestpackage=document"... 3 [0:02:32.007975]: run "ant test-backwards -Dtestpackage=document"... 5 [0:02:38.017968]: run "ant test-core -Dtestpackage=queryParser"... 3 [0:02:40.097929]: run "ant test-backwards -Dtestpackage=queryParser"... 4 [0:02:40.151166]: DONE 0 [0:02:41.927786]: DONE 1 [0:02:43.077980]: DONE 2 [0:02:46.198287]: DONE 5 [0:02:49.168172]: DONE 3 [0:02:50.197936]: DONE DONE: took 0:02:51.046643 [528 tests]
          Hide
          Robert Muir added a comment -

          Here is a in-progress patch to build.xml/common-build.xml
          It runs tests in parallel (2 jvms per cpu)
          On my computer, 'test-core' is 1:02 and 'test' is 3:29 with the patch.

          Not ready for committing yet, and needs some improvements and fixes.

          Show
          Robert Muir added a comment - Here is a in-progress patch to build.xml/common-build.xml It runs tests in parallel (2 jvms per cpu) On my computer, 'test-core' is 1:02 and 'test' is 3:29 with the patch. Not ready for committing yet, and needs some improvements and fixes.
          Hide
          Jason Rutherglen added a comment -

          Robert, very nice!

          Show
          Jason Rutherglen added a comment - Robert, very nice!
          Hide
          Robert Muir added a comment -

          Thanks Jason.

          So for newtrunk I applied a similar patch to speed up Solr's tests.
          You can see it here: http://svn.apache.org/viewvc?rev=926470&view=rev

          In this case the output is not interleaved because it uses a special formatter.
          So it basically looks just like you are not using parallel at all.
          Additionally -Dtestpackage, -Dtestpackageroot, -Dtestcase all work, the former two are also parallelized.

          So, I propose we do the same thing for Lucene tests.
          Solr was simple because it does not have these junit failed flag files.
          I propose we just remove these, like how Solr does contrib.
          Hudson hasn't failed in over a month by the way.

          Show
          Robert Muir added a comment - Thanks Jason. So for newtrunk I applied a similar patch to speed up Solr's tests. You can see it here: http://svn.apache.org/viewvc?rev=926470&view=rev In this case the output is not interleaved because it uses a special formatter. So it basically looks just like you are not using parallel at all. Additionally -Dtestpackage, -Dtestpackageroot, -Dtestcase all work, the former two are also parallelized. So, I propose we do the same thing for Lucene tests. Solr was simple because it does not have these junit failed flag files. I propose we just remove these, like how Solr does contrib. Hudson hasn't failed in over a month by the way.
          Hide
          Mark Miller added a comment -

          +1 on removing those flags - personally I find them unnecessary - and they complicate the build.

          And I would love to Lucene parallel like Solr now.

          Show
          Mark Miller added a comment - +1 on removing those flags - personally I find them unnecessary - and they complicate the build. And I would love to Lucene parallel like Solr now.
          Hide
          Michael McCandless added a comment -

          +1 for removing the flags and committing parallel tests for Lucene too.

          Show
          Michael McCandless added a comment - +1 for removing the flags and committing parallel tests for Lucene too.
          Hide
          Robert Muir added a comment -

          attached is a patch, before applying it you must do this:

          svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java
          

          The formatter itself is really broken, the sync only makes the output work within a single TestSuite.
          We need some file-locking or similar to ensure it is really correct.

          The patch does not do backwards in parallel (only core/contrib)
          Also, there is the TEMP_DIR problem i mentioned, which i haven't addressed here.

          Show
          Robert Muir added a comment - attached is a patch, before applying it you must do this: svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java The formatter itself is really broken, the sync only makes the output work within a single TestSuite. We need some file-locking or similar to ensure it is really correct. The patch does not do backwards in parallel (only core/contrib) Also, there is the TEMP_DIR problem i mentioned, which i haven't addressed here.
          Hide
          Robert Muir added a comment -

          attached is an updated patch, addressing some of the problems of the previous one:

          Remaining "bugs":

          • Solr's build.xml still doesnt yet detect that lucene's tests have changed,
            so you need to do 'ant clean' after this patch, so it will pick up the fact that
            the formatter was moved to Lucene's tests. This is a more general problem
            we need to fix, so that updates to Lucene's test code reflect in Solr without cleaning.
          • The output is still interleaved at times, we need the file locking to fix this.

          before applying the patch, do this:

          svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java
          svn copy lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java lucene/backwards/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java
          

          This is because the backwards tests need the formatter too, as they are now also run in parallel.

          Show
          Robert Muir added a comment - attached is an updated patch, addressing some of the problems of the previous one: Remaining "bugs": Solr's build.xml still doesnt yet detect that lucene's tests have changed, so you need to do 'ant clean' after this patch, so it will pick up the fact that the formatter was moved to Lucene's tests. This is a more general problem we need to fix, so that updates to Lucene's test code reflect in Solr without cleaning. The output is still interleaved at times, we need the file locking to fix this. before applying the patch, do this: svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java svn copy lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java lucene/backwards/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java This is because the backwards tests need the formatter too, as they are now also run in parallel.
          Hide
          Robert Muir added a comment -

          attached is an updated patch, run the same commands before applying the patch:

          svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java
          svn copy lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java lucene/backwards/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java
          

          Mark fixed the interleaved output here, so this is good.
          But, there is a problem in that we change java.io.tmpdir in tests, so we cannot use it
          for the lock file. So if we fix tests to only use "tempDir", then we are ok with the formatter,
          as it can just use java.io.tmpdir for its lock file.

          Show
          Robert Muir added a comment - attached is an updated patch, run the same commands before applying the patch: svn move solr/src/test/org/apache/solr/SolrJUnitResultFormatter.java lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java svn copy lucene/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java lucene/backwards/src/test/org/apache/lucene/util/LuceneJUnitResultFormatter.java Mark fixed the interleaved output here, so this is good. But, there is a problem in that we change java.io.tmpdir in tests, so we cannot use it for the lock file. So if we fix tests to only use "tempDir", then we are ok with the formatter, as it can just use java.io.tmpdir for its lock file.
          Hide
          Robert Muir added a comment -

          attached is an updated patch, as Uwe has fixed the previous tempDir issue in the backwards tests.

          I think we are close, there are only two issues I want to address first:

          • Solr tests should do some uptodate check on lucene's test code
          • Benchmark's work directory for tests should be under tempDir
          Show
          Robert Muir added a comment - attached is an updated patch, as Uwe has fixed the previous tempDir issue in the backwards tests. I think we are close, there are only two issues I want to address first: Solr tests should do some uptodate check on lucene's test code Benchmark's work directory for tests should be under tempDir
          Hide
          Robert Muir added a comment -

          attached is a patch that sets the benchmark.work.dir for benchmark tests to tempDir.
          This way these tests do not step on each other.
          Additionally some minor cleanup in the quality test was needed (e.g. use getClass().getResourceAsStream for input files)

          The only thing left now, is to make sure that solr tests detect when lucene core test code is out of date,
          as solr tests use this code... really outside the scope of this issue, but it will cause confusion if
          people have to 'ant clean' after the commit.

          Show
          Robert Muir added a comment - attached is a patch that sets the benchmark.work.dir for benchmark tests to tempDir. This way these tests do not step on each other. Additionally some minor cleanup in the quality test was needed (e.g. use getClass().getResourceAsStream for input files) The only thing left now, is to make sure that solr tests detect when lucene core test code is out of date, as solr tests use this code... really outside the scope of this issue, but it will cause confusion if people have to 'ant clean' after the commit.
          Hide
          Uwe Schindler added a comment -

          The only thing left now, is to make sure that solr tests detect when lucene core test code is out of date, as solr tests use this code... really outside the scope of this issue, but it will cause confusion if people have to 'ant clean' after the commit.

          The ANT <uptodate/> specialist will look into this!

          Show
          Uwe Schindler added a comment - The only thing left now, is to make sure that solr tests detect when lucene core test code is out of date, as solr tests use this code... really outside the scope of this issue, but it will cause confusion if people have to 'ant clean' after the commit. The ANT <uptodate/> specialist will look into this!
          Hide
          Robert Muir added a comment -

          The ANT <uptodate/> specialist will look into this!

          Ok, thanks uwe.

          I will commit this shortly, and send a note to both dev lists mentioning to run ant clean for the time being for any old checkouts.

          I think most people will prefer shaving minutes off their test time for a one time-clean... unless anyone objects!

          Show
          Robert Muir added a comment - The ANT <uptodate/> specialist will look into this! Ok, thanks uwe. I will commit this shortly, and send a note to both dev lists mentioning to run ant clean for the time being for any old checkouts. I think most people will prefer shaving minutes off their test time for a one time-clean... unless anyone objects!
          Hide
          Robert Muir added a comment -

          Committed revision 928069.

          Show
          Robert Muir added a comment - Committed revision 928069.
          Hide
          Robert Muir added a comment -

          I would like to reopen this issue to address some minor things.

          • junit should not create temp files in the src directory, like junitvmwatcher* etc.
          • we should somehow include ant.jar and ant-junit.jar in such a way its easy for
            IDE's to be configured, yet at the same time, ant doesn't give a warning. This
            was already a pre-existing condition for contrib/ant!
          • we should tone down the default threads-per-cpu to 1 by default, and allow
            it to be configurable via sysprop.

          i think these apply to solr too, so i'm proposing fixing both the build.xml's

          Show
          Robert Muir added a comment - I would like to reopen this issue to address some minor things. junit should not create temp files in the src directory, like junitvmwatcher* etc. we should somehow include ant.jar and ant-junit.jar in such a way its easy for IDE's to be configured, yet at the same time, ant doesn't give a warning. This was already a pre-existing condition for contrib/ant! we should tone down the default threads-per-cpu to 1 by default, and allow it to be configurable via sysprop. i think these apply to solr too, so i'm proposing fixing both the build.xml's
          Hide
          Shai Erera added a comment -

          One more thing - change benchmark tests to run sequentially (by adding the property).
          Robert, are you going to tackle that soon?

          Show
          Shai Erera added a comment - One more thing - change benchmark tests to run sequentially (by adding the property). Robert, are you going to tackle that soon?
          Hide
          Shai Erera added a comment -

          Since I had the changes on my local env. I thought it's best to generate a patch out of them, so they don't get lost. The patch doesn't cover the ant .jars, only the changes to common-build.xml as well as benchmark/build.xml

          Show
          Shai Erera added a comment - Since I had the changes on my local env. I thought it's best to generate a patch out of them, so they don't get lost. The patch doesn't cover the ant .jars, only the changes to common-build.xml as well as benchmark/build.xml
          Hide
          Tom Burton-West added a comment -

          I am having the same issue Shai reported in LUCENE-2353 with the parallel tests apparently causing the tests to hang on my Windows box with both Revision 931573 and Revision 931304 when running the tests from root.

          Tests hang in WriteLineDocTaskTest, on this line:
          [junit] ------------> config properties:
          [junit] directory = RAMDirectory
          [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker
          [junit] line.file.out = D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line
          [junit] -------------------------------

          I just ran the test last night with Revision 931708 and had no problem. Ran it again this morning and got the hanging behavior. The difference is that last night the only thing running on my computer besides a couple of ssh terminal windows was the tests. Today when I ran the tests and got the hanging behavior, I have firefox, outlook, exceed, wordpad open. The tests are taking 98-99.9% of my cpu while hanging. I suspect there is some kind of resource issue when running the tests in parallel.

          Tom Burton-West

          Show
          Tom Burton-West added a comment - I am having the same issue Shai reported in LUCENE-2353 with the parallel tests apparently causing the tests to hang on my Windows box with both Revision 931573 and Revision 931304 when running the tests from root. Tests hang in WriteLineDocTaskTest, on this line: [junit] ------------> config properties: [junit] directory = RAMDirectory [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker [junit] line.file.out = D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line [junit] ------------------------------- I just ran the test last night with Revision 931708 and had no problem. Ran it again this morning and got the hanging behavior. The difference is that last night the only thing running on my computer besides a couple of ssh terminal windows was the tests. Today when I ran the tests and got the hanging behavior, I have firefox, outlook, exceed, wordpad open. The tests are taking 98-99.9% of my cpu while hanging. I suspect there is some kind of resource issue when running the tests in parallel. Tom Burton-West
          Hide
          Robert Muir added a comment -

          Thanks Tom and Shai... sorry I haven't gotten to fix this yet.

          Shai, would you mind committing your patch? we can keep the issue open to add the sysprop and fix the ant jar thing, and apply the same fixes to Solr's build.xml

          Show
          Robert Muir added a comment - Thanks Tom and Shai... sorry I haven't gotten to fix this yet. Shai, would you mind committing your patch? we can keep the issue open to add the sysprop and fix the ant jar thing, and apply the same fixes to Solr's build.xml
          Hide
          Tom Burton-West added a comment -

          This may or may not be a clue to the problem in benchmark. When I control-C'd the hung test, I got the error reported below.
          Tom.

          [junit] directory = RAMDirectory
          [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker
          [junit] line.file.out = C:\cygwin\home\tburtonw\lucene\april07_good\build\contrib\benchmark\test\W\one-line
          [junit] -------------------------------
          [junit] ------------- ---------------- ---------------
          [junit] java.io.FileNotFoundException: C:\cygwin\home\tburtonw\lucene\april07_good\contrib\benchmark\junitvmwatcher203463231158436475.properties (The process cannot access the file because it is being used by another process)
          [junit] at java.io.FileInputStream.open(Native Method)
          [junit] at java.io.FileInputStream.<init>(FileInputStream.java:106)
          [junit] at java.io.FileReader.<init>(FileReader.java:55)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1025)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:876)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:803)
          [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
          [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          [junit] at java.lang.reflect.Method.invoke(Method.java:597)
          [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
          [junit] at org.apache.tools.ant.Task.perform(Task.java:348)
          [junit] at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:62)
          [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
          [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          [junit] at java.lang.reflect.Method.invoke(Method.java:597)
          [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
          [junit] at org.apache.tools.ant.Task.perform(Task.java:348)
          [junit] at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:394)
          [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
          [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          [junit] at java.lang.reflect.Method.invoke(Method.java:597)
          [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
          [junit] at org.apache.tools.ant.Task.perform(Task.java:348)
          [junit] at org.apache.tools.ant.taskdefs.Parallel$TaskRunnable.run(Parallel.java:428)
          [junit] at java.lang.Thread.run(Thread.java:619)

          Show
          Tom Burton-West added a comment - This may or may not be a clue to the problem in benchmark. When I control-C'd the hung test, I got the error reported below. Tom. [junit] directory = RAMDirectory [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker [junit] line.file.out = C:\cygwin\home\tburtonw\lucene\april07_good\build\contrib\benchmark\test\W\one-line [junit] ------------------------------- [junit] ------------- ---------------- --------------- [junit] java.io.FileNotFoundException: C:\cygwin\home\tburtonw\lucene\april07_good\contrib\benchmark\junitvmwatcher203463231158436475.properties (The process cannot access the file because it is being used by another process) [junit] at java.io.FileInputStream.open(Native Method) [junit] at java.io.FileInputStream.<init>(FileInputStream.java:106) [junit] at java.io.FileReader.<init>(FileReader.java:55) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1025) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:876) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:803) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:62) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:394) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.Parallel$TaskRunnable.run(Parallel.java:428) [junit] at java.lang.Thread.run(Thread.java:619)
          Hide
          Robert Muir added a comment -

          Thanks Tom, this is exactly what happened to Shai.

          Can you try his patch and see if it fixed the problem for you?

          Show
          Robert Muir added a comment - Thanks Tom, this is exactly what happened to Shai. Can you try his patch and see if it fixed the problem for you?
          Hide
          Shai Erera added a comment -

          Robert, I will commit the patch, seems good to do anyway. We can handle the ant jars separately later.

          And ths hang behavior is exactly what I experience, including the FileInputStream thing. Only on my machine, when I took a thread dump, it showed that Ant waits on FIS.read() ...

          Robert - to remind you that even with the patch which forces junit to use a separate temp folder per thread, it still hung ...

          Show
          Shai Erera added a comment - Robert, I will commit the patch, seems good to do anyway. We can handle the ant jars separately later. And ths hang behavior is exactly what I experience, including the FileInputStream thing. Only on my machine, when I took a thread dump, it showed that Ant waits on FIS.read() ... Robert - to remind you that even with the patch which forces junit to use a separate temp folder per thread, it still hung ...
          Hide
          Tom Burton-West added a comment -

          Hi Robert,

          I patched Revision 931708 and ran "ant clean test-contribute" and the tests ran just fine. The patch seems to have solved the problem.

          Tom

          Show
          Tom Burton-West added a comment - Hi Robert, I patched Revision 931708 and ran "ant clean test-contribute" and the tests ran just fine. The patch seems to have solved the problem. Tom
          Hide
          Robert Muir added a comment -

          I committed the junit tempdir fix in revision 932857... will handle these one at a time.

          Show
          Robert Muir added a comment - I committed the junit tempdir fix in revision 932857... will handle these one at a time.
          Hide
          Shai Erera added a comment -

          Committed revision 932878 with the following:

          1. benchmark tests force sequential run
          2. threadsPerProcessor defaults to 1 and can be overridden by -DthreadsPerProcessor=<value>
          3. A CHANGES entry
          Show
          Shai Erera added a comment - Committed revision 932878 with the following: benchmark tests force sequential run threadsPerProcessor defaults to 1 and can be overridden by -DthreadsPerProcessor=<value> A CHANGES entry
          Hide
          Robert Muir added a comment -

          I've propagated Shai's improvements to Solr.
          Additionally, I added the ant lib as a convenience for IDE users (revision 933575.)
          It doesn't cause ant classpath warnings when running from the commandline. if you experience this, please let me know!
          It only means its easier for IDE users to work on lucene/solr and not see compile errors from contrib/ant and src/test.

          Show
          Robert Muir added a comment - I've propagated Shai's improvements to Solr. Additionally, I added the ant lib as a convenience for IDE users (revision 933575.) It doesn't cause ant classpath warnings when running from the commandline. if you experience this, please let me know! It only means its easier for IDE users to work on lucene/solr and not see compile errors from contrib/ant and src/test.
          Hide
          Robert Muir added a comment -

          backported to 3.x, rev 941663

          Show
          Robert Muir added a comment - backported to 3.x, rev 941663
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Robert Muir
              Reporter:
              Jason Rutherglen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development