Mahout
  1. Mahout
  2. MAHOUT-1345

Enable randomised testing for all Mahout modules

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: None
    • Labels:
      None

      Description

      When enabling randomised testing for all modules I found a few tests became unstable or even fail deterministically due to lingering threads. The attached patch:

      • defines the randomised testing dependency in our parent pom
      • re-uses said dependencies in all depending modules (makes upgrading easier as the version number needs to be changed in just one place)
      • adds several code changes that fixed the failures due to lingering threads for me on my machine. I'd greatly appreciate input a) from those who wrote the respective code and b) others who ran the tests with these changes to make sure there are no other tests that suffer from the same issues.

      Warning: I touched quite a few bits and pieces I'm not intimately familiar with over the last few weeks (whenever I had a few spare minutes) - second pair of eyes needed.

      1. MAHOUT-1345.patch
        26 kB
        Suneel Marthi
      2. MAHOUT-1345.diff
        25 kB
        Isabel Drost-Fromm

        Activity

        Hide
        Stevo Slavic added a comment -

        +1 for dependency changes, and build/tests pass locally.

        Show
        Stevo Slavic added a comment - +1 for dependency changes, and build/tests pass locally.
        Hide
        Dawid Weiss added a comment -

        This looks good, Isabel! Just for the sake of clarity – perhaps it'll be useful for others:

             executor.awaitTermination(10, TimeUnit.SECONDS);
        ...
             executor.shutdown();
        

        The executors framework in Java doesn't wait for any threads it creates to die (even after the executor is shutdown) and the the test framework will check for any remaining threads right after the test is over (and fail if there are any). To add some "slack time" to allow any spawned threads to die, you can use ThreadLeakLingering (and I see that you already do use this in certain cases).

        In Lucene/Solr this is sort of handled by having a separate top-level class from which all test suites inherit, all annotations configuring the framework are defined there.

        Show
        Dawid Weiss added a comment - This looks good, Isabel! Just for the sake of clarity – perhaps it'll be useful for others: executor.awaitTermination(10, TimeUnit.SECONDS); ... executor.shutdown(); The executors framework in Java doesn't wait for any threads it creates to die (even after the executor is shutdown) and the the test framework will check for any remaining threads right after the test is over (and fail if there are any). To add some "slack time" to allow any spawned threads to die, you can use ThreadLeakLingering (and I see that you already do use this in certain cases). In Lucene/Solr this is sort of handled by having a separate top-level class from which all test suites inherit, all annotations configuring the framework are defined there.
        Hide
        Suneel Marthi added a comment - - edited

        Isabel Drost-Fromm Stevo Slavic Applied this patch and I am now seeing random test failures. I am running on Mac OS 10.8 Mountain Lion.

        
        Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.96 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentInputFormatTest
        testGetSplits(org.apache.mahout.text.LuceneSegmentInputFormatTest)  Time elapsed: 3.295 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testGetSplits(org.apache.mahout.text.LuceneSegmentInputFormatTest): 
           1) Thread[id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main]
                at java.lang.Object.wait(Native Method)
                at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([D01FBCB7576C7478:647C7E93467BBD35]:0)
        
        Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.649 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentInputSplitTest
        testGetSegmentNonExistingSegment(org.apache.mahout.text.LuceneSegmentInputSplitTest)  Time elapsed: 3.571 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testGetSegmentNonExistingSegment(org.apache.mahout.text.LuceneSegmentInputSplitTest): 
           1) Thread[id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main]
                at java.lang.Object.wait(Native Method)
                at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([760C118FE5998411:31C8F60AD35952AF]:0)
        
        Running org.apache.mahout.utils.email.MailProcessorTest
        Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
        Running org.apache.mahout.utils.regex.RegexMapperTest
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.902 sec - in org.apache.mahout.utils.email.MailProcessorTest
        Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.855 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentRecordReaderTest
        testNonExistingField(org.apache.mahout.text.LuceneSegmentRecordReaderTest)  Time elapsed: 0.955 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testNonExistingField(org.apache.mahout.text.LuceneSegmentRecordReaderTest): 
           1) Thread[id=17, name=AWT-Shutdown, state=TIMED_WAITING, group=main]
                at java.lang.Object.wait(Native Method)
                at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([BAD70A9A105E6C9:F7B1E5646F3C1F87]:0)
        
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.005 sec - in org.apache.mahout.text.SequenceFilesFromMailArchivesTest
        Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.007 sec - in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
        Tests run: 5, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 8.123 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest
        testNewLucene2SeqConfiguration(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest)  Time elapsed: 3.33 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testNewLucene2SeqConfiguration(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 
           1) Thread[id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main]
                at java.lang.Object.wait(Native Method)
                at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:C2C9DD1521FA15E]:0)
        
        testRunOptionalArguments(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest)  Time elapsed: 2.256 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testRunOptionalArguments(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 
           1) Thread[id=20, name=AWT-Shutdown, state=TIMED_WAITING, group=main]
                at java.lang.Object.wait(Native Method)
                at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:92E891124EB97769]:0)
        
        testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest)  Time elapsed: 1.416 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 
           1) Thread[id=25, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
                at java.lang.Thread.run(Thread.java:695)
           2) Thread[id=26, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:F45193CA73B3494C]:0)
        
        
        Show
        Suneel Marthi added a comment - - edited Isabel Drost-Fromm Stevo Slavic Applied this patch and I am now seeing random test failures. I am running on Mac OS 10.8 Mountain Lion. Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.96 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentInputFormatTest testGetSplits(org.apache.mahout.text.LuceneSegmentInputFormatTest) Time elapsed: 3.295 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testGetSplits(org.apache.mahout.text.LuceneSegmentInputFormatTest): 1) Thread [id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main] at java.lang. Object .wait(Native Method) at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([D01FBCB7576C7478:647C7E93467BBD35]:0) Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.649 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentInputSplitTest testGetSegmentNonExistingSegment(org.apache.mahout.text.LuceneSegmentInputSplitTest) Time elapsed: 3.571 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testGetSegmentNonExistingSegment(org.apache.mahout.text.LuceneSegmentInputSplitTest): 1) Thread [id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main] at java.lang. Object .wait(Native Method) at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([760C118FE5998411:31C8F60AD35952AF]:0) Running org.apache.mahout.utils.email.MailProcessorTest Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest Running org.apache.mahout.utils.regex.RegexMapperTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.902 sec - in org.apache.mahout.utils.email.MailProcessorTest Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.855 sec <<< FAILURE! - in org.apache.mahout.text.LuceneSegmentRecordReaderTest testNonExistingField(org.apache.mahout.text.LuceneSegmentRecordReaderTest) Time elapsed: 0.955 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testNonExistingField(org.apache.mahout.text.LuceneSegmentRecordReaderTest): 1) Thread [id=17, name=AWT-Shutdown, state=TIMED_WAITING, group=main] at java.lang. Object .wait(Native Method) at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([BAD70A9A105E6C9:F7B1E5646F3C1F87]:0) Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.005 sec - in org.apache.mahout.text.SequenceFilesFromMailArchivesTest Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.007 sec - in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest Tests run: 5, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 8.123 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest testNewLucene2SeqConfiguration(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest) Time elapsed: 3.33 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testNewLucene2SeqConfiguration(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 1) Thread [id=14, name=AWT-Shutdown, state=TIMED_WAITING, group=main] at java.lang. Object .wait(Native Method) at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:C2C9DD1521FA15E]:0) testRunOptionalArguments(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest) Time elapsed: 2.256 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testRunOptionalArguments(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 1) Thread [id=20, name=AWT-Shutdown, state=TIMED_WAITING, group=main] at java.lang. Object .wait(Native Method) at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:284) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:92E891124EB97769]:0) testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest) Time elapsed: 1.416 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 1) Thread [id=25, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] at java.lang. Object .wait(Native Method) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) at java.lang. Thread .run( Thread .java:695) 2) Thread [id=26, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] at java.lang. Object .wait(Native Method) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([E764A6A6A163BB7D:F45193CA73B3494C]:0)
        Hide
        Dawid Weiss added a comment -

        This means the code starts (touches) an AWT subsystem somehow and starts a background system thread. Workarounds:

        • add java.awt.headless=true to junit4 sysprops:
          <sysproperty key="java.awt.headless" value="true"/>
        • ignore this particular system thread by adding:
          @@ThreadLeakFilters(defaultFilters = true, filters = {
              QuickPatchThreadsFilter.class
          })
          

          to all test classes where this is the case (or a subclass of all test classes). QuickPatchThreadsFilter is a custom thread filter from Lucene, but I believe default filter set also ignores AWT subsystem so just defaultFilters=true should do.

        Show
        Dawid Weiss added a comment - This means the code starts (touches) an AWT subsystem somehow and starts a background system thread. Workarounds: add java.awt.headless=true to junit4 sysprops: <sysproperty key= "java.awt.headless" value= " true " /> ignore this particular system thread by adding: @@ThreadLeakFilters(defaultFilters = true , filters = { QuickPatchThreadsFilter.class }) to all test classes where this is the case (or a subclass of all test classes). QuickPatchThreadsFilter is a custom thread filter from Lucene, but I believe default filter set also ignores AWT subsystem so just defaultFilters=true should do.
        Hide
        Suneel Marthi added a comment -

        FYI.. these failures are not random and are happening for Lucene tests (lucene2seq, LuceneIterableTest, TestClusterDumper etc..). All tests that have Lucene API calls.

        Show
        Suneel Marthi added a comment - FYI.. these failures are not random and are happening for Lucene tests (lucene2seq, LuceneIterableTest, TestClusterDumper etc..). All tests that have Lucene API calls.
        Hide
        Dawid Weiss added a comment -

        Lucene is tested with java.awt.headless=true. Might be that jmx bean or something is starting an awt thread – I vaguely remember this to be the case. If you add java.awt.headless=true to your tests this won't be happening.

        Show
        Dawid Weiss added a comment - Lucene is tested with java.awt.headless=true. Might be that jmx bean or something is starting an awt thread – I vaguely remember this to be the case. If you add java.awt.headless=true to your tests this won't be happening.
        Hide
        Dawid Weiss added a comment -

        I've applied this patch to the trunk, out of curiosity. This seems to be the well known problem with Mac Java – it starts an AWT thread when mx bean is loaded. I believe this has been fixed in a newer Lucene version, but I'd need a Mac to actually verify this.

        Show
        Dawid Weiss added a comment - I've applied this patch to the trunk, out of curiosity. This seems to be the well known problem with Mac Java – it starts an AWT thread when mx bean is loaded. I believe this has been fixed in a newer Lucene version, but I'd need a Mac to actually verify this.
        Hide
        Sean Owen added a comment -

        Calling awaitTermination before shutdown won't do anything. It is something you call after shutdown. So I think this patch has that point reversed.

        java.awt.headless should be set, but it already is AFAICT in the pom.

        There is one place where the patch moves a call to close() inside a catch block – better, but better still in a finally block.

        Show
        Sean Owen added a comment - Calling awaitTermination before shutdown won't do anything. It is something you call after shutdown. So I think this patch has that point reversed. java.awt.headless should be set, but it already is AFAICT in the pom. There is one place where the patch moves a call to close() inside a catch block – better, but better still in a finally block.
        Hide
        Suneel Marthi added a comment - - edited

        Dawid/Sean,

        I updated the patch with Sean's comments and also upgraded Lucene to 4.5.1 (per Dawid's earlier comment). That seems to have done it. I am seeing this one test below (from lucene2seq) that's failing. I haven't had time to look at it yet, will do so later today.

        Upgrading Lucene to 4.5.1 seems to have fixed the other Lucene test failures on Mac OS 10.8.

        
        Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.74 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest
        testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest)  Time elapsed: 5.295 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 
           1) Thread[id=17, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
                at java.lang.Thread.run(Thread.java:695)
           2) Thread[id=18, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
                at java.lang.Thread.run(Thread.java:695)
        	at __randomizedtesting.SeedInfo.seed([1461D3656229CEF6:754E609B0F93CC7]:0)
        
        
        Show
        Suneel Marthi added a comment - - edited Dawid/Sean, I updated the patch with Sean's comments and also upgraded Lucene to 4.5.1 (per Dawid's earlier comment). That seems to have done it. I am seeing this one test below (from lucene2seq) that's failing. I haven't had time to look at it yet, will do so later today. Upgrading Lucene to 4.5.1 seems to have fixed the other Lucene test failures on Mac OS 10.8. Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.74 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest) Time elapsed: 5.295 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 1) Thread [id=17, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] at java.lang. Object .wait(Native Method) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) at java.lang. Thread .run( Thread .java:695) 2) Thread [id=18, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] at java.lang. Object .wait(Native Method) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) at java.lang. Thread .run( Thread .java:695) at __randomizedtesting.SeedInfo.seed([1461D3656229CEF6:754E609B0F93CC7]:0)
        Hide
        Suneel Marthi added a comment -

        Attached is the updated patch from today's codebase, the patch works on Mac OS for Lucene >= 4.4 (from present 4.3.1). The workaround for the failing test in my previous comment is to add "-xm sequential" which seems to fix it, will work with Frank Scholten later to fix that test.

        Guess, we will be upgrading to the latest stable version of Lucene and Solr prior to 0.9 release.

        Show
        Suneel Marthi added a comment - Attached is the updated patch from today's codebase, the patch works on Mac OS for Lucene >= 4.4 (from present 4.3.1). The workaround for the failing test in my previous comment is to add "-xm sequential" which seems to fix it, will work with Frank Scholten later to fix that test. Guess, we will be upgrading to the latest stable version of Lucene and Solr prior to 0.9 release.
        Hide
        Frank Scholten added a comment -

        I applied the pacth, ran a mvn clean install on Ubuntu 13.04 and OpenJDK 1.7.0_25 and got this stacktrace:

        17:28:25 Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 79.239 sec <<< FAILURE! - in org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer
        17:28:25 testInMemoryCVB0(org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer)  Time elapsed: 13.716 sec  <<< ERROR!
        17:28:25 com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testInMemoryCVB0(org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer): 
        17:28:25    1) Thread[id=358, name=pool-154-thread-1, state=RUNNABLE, group=TGRP-TestCVBModelTrainer]
        17:28:25         at sun.misc.Unsafe.unpark(Native Method)
        17:28:25         at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:152)
        17:28:25         at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
        17:28:25         at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1263)
        17:28:25         at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460)
        17:28:25         at java.util.concurrent.ThreadPoolExecutor.tryTerminate(ThreadPoolExecutor.java:712)
        17:28:25         at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1006)
        17:28:25         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
        17:28:25         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        17:28:25         at java.lang.Thread.run(Thread.java:724)
        17:28:25 	at __randomizedtesting.SeedInfo.seed([E9C2EFFF8F543D07:7939EAE64DE10951]:0)
        

        I don't know how to interpret this yet. I'll read up Dawid Weiss' code and docs on randomized testing first. Would be nice feature!

        A few links I am looking at:

        http://labs.carrotsearch.com/randomizedtesting-concept.html
        https://github.com/carrotsearch/randomizedtesting/tree/master/examples/maven/src/main/java/com/carrotsearch/examples/randomizedrunner

        Show
        Frank Scholten added a comment - I applied the pacth, ran a mvn clean install on Ubuntu 13.04 and OpenJDK 1.7.0_25 and got this stacktrace: 17:28:25 Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 79.239 sec <<< FAILURE! - in org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer 17:28:25 testInMemoryCVB0(org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer) Time elapsed: 13.716 sec <<< ERROR! 17:28:25 com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testInMemoryCVB0(org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer): 17:28:25 1) Thread [id=358, name=pool-154-thread-1, state=RUNNABLE, group=TGRP-TestCVBModelTrainer] 17:28:25 at sun.misc.Unsafe.unpark(Native Method) 17:28:25 at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:152) 17:28:25 at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662) 17:28:25 at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1263) 17:28:25 at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460) 17:28:25 at java.util.concurrent.ThreadPoolExecutor.tryTerminate(ThreadPoolExecutor.java:712) 17:28:25 at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1006) 17:28:25 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163) 17:28:25 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 17:28:25 at java.lang. Thread .run( Thread .java:724) 17:28:25 at __randomizedtesting.SeedInfo.seed([E9C2EFFF8F543D07:7939EAE64DE10951]:0) I don't know how to interpret this yet. I'll read up Dawid Weiss ' code and docs on randomized testing first. Would be nice feature! A few links I am looking at: http://labs.carrotsearch.com/randomizedtesting-concept.html https://github.com/carrotsearch/randomizedtesting/tree/master/examples/maven/src/main/java/com/carrotsearch/examples/randomizedrunner
        Hide
        Dawid Weiss added a comment -

        Frank, this means a thread was created in the scope of a test (or suite – this is configurable) and then wasn't cleaned up properly. The problem with thread pools is that they use unnamed threads by default ("pool-154-thread-1"); if possible it's always best to use a custom thread factory that names threads so that they can be tracked back to their source (the pool that generates them).

        In certain cases thread pools can be shut down and not wait for the forked threads to perish. There is little one can do about it but wait – there is an annotation in rr that does just that (@@ThreadLeakLingering(linger = 2000)).

        If you're looking for some advanced scenarios take a look at Lucene source code (LuceneTestCase class).

        Show
        Dawid Weiss added a comment - Frank, this means a thread was created in the scope of a test (or suite – this is configurable) and then wasn't cleaned up properly. The problem with thread pools is that they use unnamed threads by default ("pool-154-thread-1"); if possible it's always best to use a custom thread factory that names threads so that they can be tracked back to their source (the pool that generates them). In certain cases thread pools can be shut down and not wait for the forked threads to perish. There is little one can do about it but wait – there is an annotation in rr that does just that (@@ThreadLeakLingering(linger = 2000)). If you're looking for some advanced scenarios take a look at Lucene source code (LuceneTestCase class).
        Hide
        Frank Scholten added a comment -

        Thanks for the clarification, Dawid!

        You mean like the example in https://github.com/carrotsearch/randomizedtesting/blob/master/examples/maven/src/main/java/com/carrotsearch/examples/randomizedrunner/Test010Lingering.java?

        I am not familiar with the CVB0 code but to me it seems the test fails because the CachingCVB0mapper starts a few threads via TopicModel and/or ModelTrainer which are not cleaned up properly. Correct?

        I added an ThreadLeakLingering annotation on TestCVBModelTrainer and now my build fails on org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest. I'll take a look at that one.

        How do you determine a good value for the linger property?

        Show
        Frank Scholten added a comment - Thanks for the clarification, Dawid! You mean like the example in https://github.com/carrotsearch/randomizedtesting/blob/master/examples/maven/src/main/java/com/carrotsearch/examples/randomizedrunner/Test010Lingering.java? I am not familiar with the CVB0 code but to me it seems the test fails because the CachingCVB0mapper starts a few threads via TopicModel and/or ModelTrainer which are not cleaned up properly. Correct? I added an ThreadLeakLingering annotation on TestCVBModelTrainer and now my build fails on org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest. I'll take a look at that one. How do you determine a good value for the linger property?
        Hide
        Frank Scholten added a comment -

        OK so now I reached the lucene2seq test and added a ThreadLeakLingering(linger=1000) annotation on it but I get:

        19:46:38 Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.492 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest
        19:46:38 testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest)  Time elapsed: 2.925 sec  <<< ERROR!
        19:46:38 com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 
        19:46:38    1) Thread[id=23, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
        19:46:38         at java.lang.Object.wait(Native Method)
        19:46:38         at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
        19:46:38         at java.lang.Thread.run(Thread.java:724)
        19:46:38    2) Thread[id=24, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest]
        19:46:38         at java.lang.Object.wait(Native Method)
        19:46:38         at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658)
        19:46:38         at java.lang.Thread.run(Thread.java:724)
        19:46:38 	at __randomizedtesting.SeedInfo.seed([2D997537AD6A02A0:3EAC405B7FBAF091]:0)
        

        There are two Hadoop task communication threads running. Shouldn't these be destroyed if a ThreadLeakLingering annotation is added?

        Show
        Frank Scholten added a comment - OK so now I reached the lucene2seq test and added a ThreadLeakLingering(linger=1000) annotation on it but I get: 19:46:38 Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 10.492 sec <<< FAILURE! - in org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest 19:46:38 testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest) Time elapsed: 2.925 sec <<< ERROR! 19:46:38 com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from TEST scope at testRunInvalidQuery(org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest): 19:46:38 1) Thread [id=23, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] 19:46:38 at java.lang. Object .wait(Native Method) 19:46:38 at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) 19:46:38 at java.lang. Thread .run( Thread .java:724) 19:46:38 2) Thread [id=24, name=communication thread, state=TIMED_WAITING, group=TGRP-SequenceFilesFromLuceneStorageDriverTest] 19:46:38 at java.lang. Object .wait(Native Method) 19:46:38 at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:658) 19:46:38 at java.lang. Thread .run( Thread .java:724) 19:46:38 at __randomizedtesting.SeedInfo.seed([2D997537AD6A02A0:3EAC405B7FBAF091]:0) There are two Hadoop task communication threads running. Shouldn't these be destroyed if a ThreadLeakLingering annotation is added?
        Hide
        Suneel Marthi added a comment -

        Frank Scholten ThreadLeakLingering doesn't destroy hose threads, I tried setting linger = 1000, 10000, 20000, 30000; but to no avail. One thing I did notice though, if I remove the arg "-q invalid:query" and modify the subsequent assertions than this passes (in MR). What's different with "-q invalid:query"?

        Show
        Suneel Marthi added a comment - Frank Scholten ThreadLeakLingering doesn't destroy hose threads, I tried setting linger = 1000, 10000, 20000, 30000; but to no avail. One thing I did notice though, if I remove the arg "-q invalid:query" and modify the subsequent assertions than this passes (in MR). What's different with "-q invalid:query"?
        Hide
        Dawid Weiss added a comment -

        There is no way to reliably "kill" threads in Java. You can try to interrupt them, but in general threads can catch exceptions and respin in a loop. It is the test's duty to clean up any resources it starts – including shutting down any threads or thread pools it starts up.

        If you have suite-level initialized stuff (thread pools or resources), like BeforeClass or class rules, then detect thread leaks at the suite level (after all tests of a class are over), not in every test. This is done by declaring

        @ThreadLeakScope(Scope.SUITE)
        

        Thread lingering is meant only for those corner cases when one cannot join all threads that are to be terminated before the test is over (and in effect the thread may be still in an alive state). Thread pools are one example of when this is the case (no direct control over thread pool threads, although one can shutdown the threadpool itself).

        In general all this is a bit complex, but getting it to work is well worth overcoming the initial learning curve. You get a (soft, although valuable) guarantee that no test stomps over each other by leaving threads that do something in the background.

        Show
        Dawid Weiss added a comment - There is no way to reliably "kill" threads in Java. You can try to interrupt them, but in general threads can catch exceptions and respin in a loop. It is the test's duty to clean up any resources it starts – including shutting down any threads or thread pools it starts up. If you have suite-level initialized stuff (thread pools or resources), like BeforeClass or class rules, then detect thread leaks at the suite level (after all tests of a class are over), not in every test. This is done by declaring @ThreadLeakScope(Scope.SUITE) Thread lingering is meant only for those corner cases when one cannot join all threads that are to be terminated before the test is over (and in effect the thread may be still in an alive state). Thread pools are one example of when this is the case (no direct control over thread pool threads, although one can shutdown the threadpool itself). In general all this is a bit complex, but getting it to work is well worth overcoming the initial learning curve. You get a (soft, although valuable) guarantee that no test stomps over each other by leaving threads that do something in the background.
        Hide
        Frank Scholten added a comment -

        Aha, good to know. What happens in the 'invalid query' test is that the initialize method of the LuceneSegmentRecordReader throws an IllegalArgument exception when creating the scorer. By that time the two map taks are already created. Suneel showed that waiting for them does not work. Is there something in the Hadoop API we can use to influence these threads somehow?

        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize
        INFO:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1347dad
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper
        INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@1649784
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run
        INFO: Starting task: attempt_local1401304847_0001_m_000001_0
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize
        INFO:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8a6ff9
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper
        INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@b413de
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
        INFO: Map task executor complete.
        Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
        WARNING: job_local1401304847_0001
        java.lang.Exception: java.io.IOException: Could not create query scorer for query: invalid:query
        	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
        Caused by: java.io.IOException: Could not create query scorer for query: invalid:query
        	at org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:72)
        	at org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76)
        	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:488)
        	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
        	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        	at java.lang.Thread.run(Thread.java:724)
        
        Show
        Frank Scholten added a comment - Aha, good to know. What happens in the 'invalid query' test is that the initialize method of the LuceneSegmentRecordReader throws an IllegalArgument exception when creating the scorer. By that time the two map taks are already created. Suneel showed that waiting for them does not work. Is there something in the Hadoop API we can use to influence these threads somehow? Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize INFO: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1347dad Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@1649784 Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run INFO: Starting task: attempt_local1401304847_0001_m_000001_0 Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.Task initialize INFO: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8a6ff9 Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.MapTask runNewMapper INFO: Processing split: org.apache.mahout.text.LuceneSegmentInputSplit@b413de Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run INFO: Map task executor complete. Nov 24, 2013 9:19:46 PM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local1401304847_0001 java.lang.Exception: java.io.IOException: Could not create query scorer for query: invalid:query at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.io.IOException: Could not create query scorer for query: invalid:query at org.apache.mahout.text.LuceneSegmentRecordReader.initialize(LuceneSegmentRecordReader.java:72) at org.apache.mahout.text.LuceneSegmentInputFormat.createRecordReader(LuceneSegmentInputFormat.java:76) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:488) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:724)
        Hide
        Dawid Weiss added a comment - - edited

        I've no idea – it's been a long time since I worked with Hadoop. I'm pretty sure it should have an API to shut it down though? Anyway, if nothing else works you can also disable thread leak checking for a class or test.

        @ThreadLeakScope(Scope.NONE)
        
        Show
        Dawid Weiss added a comment - - edited I've no idea – it's been a long time since I worked with Hadoop. I'm pretty sure it should have an API to shut it down though? Anyway, if nothing else works you can also disable thread leak checking for a class or test. @ThreadLeakScope(Scope.NONE)
        Hide
        Suneel Marthi added a comment - - edited

        Frank Scholten There is a Job.killJob() but not sure how to retrieve the Job object given the Configuration.

        The other option is to not invoke the MR version of this test which is what I ended up doing (add "-xm sequential") flag (like what's being done for testRun()).

        Show
        Suneel Marthi added a comment - - edited Frank Scholten There is a Job.killJob() but not sure how to retrieve the Job object given the Configuration. The other option is to not invoke the MR version of this test which is what I ended up doing (add "-xm sequential") flag (like what's being done for testRun()).
        Hide
        Isabel Drost-Fromm added a comment -

        Frank Scholten Without having checked the code: You create the threads during map task initialisation? Do you ever stop them e.g. when cleanup is called?

        Show
        Isabel Drost-Fromm added a comment - Frank Scholten Without having checked the code: You create the threads during map task initialisation? Do you ever stop them e.g. when cleanup is called?
        Hide
        Frank Scholten added a comment -

        Isabel Drost-Fromm The threads I am talking about are the map tasks themselves, part of the lucene2seq MR job. The initialize method of the RecordReader throws an exception due to the invalid query, which is correct because that is what we are testing. However this exception runs after the map tasks are created. A Hadoop job can be aborted by throwing an exception in the RecordReader but this does not properly clean up the map task threads before the JUnit test exits. So my question is what's the Hadoop way of cleaning up these map tasks when the RecordReader cannot continue, in this case because of an invalid query. Suneel Marthi I see I can get the job ID via the context but dont't know how to get to the Job. On the other hand, like Dawid Weiss said, maybe we should not check thread leaks at all in this particular testcase.

        Show
        Frank Scholten added a comment - Isabel Drost-Fromm The threads I am talking about are the map tasks themselves, part of the lucene2seq MR job. The initialize method of the RecordReader throws an exception due to the invalid query, which is correct because that is what we are testing. However this exception runs after the map tasks are created. A Hadoop job can be aborted by throwing an exception in the RecordReader but this does not properly clean up the map task threads before the JUnit test exits. So my question is what's the Hadoop way of cleaning up these map tasks when the RecordReader cannot continue, in this case because of an invalid query. Suneel Marthi I see I can get the job ID via the context but dont't know how to get to the Job. On the other hand, like Dawid Weiss said, maybe we should not check thread leaks at all in this particular testcase.
        Hide
        Dawid Weiss added a comment -

        Isn't it that Hadoop assumes it pretty much owns the JVM? If so then, ideally, you could fork a separate process to run the job and then terminate the process... although there is no API to do that from Java either (at least not until 1.8) Killing things is not what Java excels at.

        Show
        Dawid Weiss added a comment - Isn't it that Hadoop assumes it pretty much owns the JVM? If so then, ideally, you could fork a separate process to run the job and then terminate the process... although there is no API to do that from Java either (at least not until 1.8) Killing things is not what Java excels at.
        Hide
        Suneel Marthi added a comment -

        Isabel Drost-Fromm Frank Scholten We can get past the last failing test by running in '-xm sequential' mode, otherwise is this good to go?

        Show
        Suneel Marthi added a comment - Isabel Drost-Fromm Frank Scholten We can get past the last failing test by running in '-xm sequential' mode, otherwise is this good to go?
        Hide
        Isabel Drost-Fromm added a comment -

        Fine by me.

        Show
        Isabel Drost-Fromm added a comment - Fine by me.
        Hide
        Frank Scholten added a comment -

        Suneel Marthi] Agreed.

        Show
        Frank Scholten added a comment - Suneel Marthi ] Agreed.
        Hide
        Suneel Marthi added a comment -

        Isabel Drost-Fromm This can be committed to trunk, we have a separate JIRA to upgrade to Lucene 4.6.0 (if possible in 0.9 timelines) else we should upgrade to lucene 4.5.1 to not see failures on Mac OS.

        Show
        Suneel Marthi added a comment - Isabel Drost-Fromm This can be committed to trunk, we have a separate JIRA to upgrade to Lucene 4.6.0 (if possible in 0.9 timelines) else we should upgrade to lucene 4.5.1 to not see failures on Mac OS.
        Hide
        Suneel Marthi added a comment -

        Frank Scholten Is there a more recent patch then the last one from Nov 24 for this issue?

        Show
        Suneel Marthi added a comment - Frank Scholten Is there a more recent patch then the last one from Nov 24 for this issue?
        Hide
        Frank Scholten added a comment -

        Suneel Marthi No but I will create one.

        Show
        Frank Scholten added a comment - Suneel Marthi No but I will create one.
        Hide
        Frank Scholten added a comment -

        The current patch does not give me any build failures.

        Show
        Frank Scholten added a comment - The current patch does not give me any build failures.
        Hide
        Suneel Marthi added a comment -

        Patch committed to trunk. Had to upgrade Lucene to 4.5.1 for the Lucene tests to pass on Mac OS.

        Show
        Suneel Marthi added a comment - Patch committed to trunk. Had to upgrade Lucene to 4.5.1 for the Lucene tests to pass on Mac OS.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Mahout-Quality #2345 (See https://builds.apache.org/job/Mahout-Quality/2345/)
        MAHOUT-1345: Enable randomised testing for all Mahout modules (smarthi: rev 1546827)

        • /mahout/trunk/core/pom.xml
        • /mahout/trunk/pom.xml
          MAHOUT-1345: Enable randomised testing for all Mahout modules (smarthi: rev 1546826)
        • /mahout/trunk/integration/pom.xml
        • /mahout/trunk/math/pom.xml
          MAHOUT-1345: Enable randomised testing for all Mahout modules (smarthi: rev 1546825)
        • /mahout/trunk/.gitignore
        • /mahout/trunk/CHANGELOG
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/eval/AbstractDifferenceRecommenderEvaluator.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/lucene/AnalyzerUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/ep/EvolutionaryProcess.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/recommender/svd/ALSWRFactorizerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/recommender/svd/ParallelSGDFactorizerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegressionTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/ModelSerializerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/ep/EvolutionaryProcessTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/HighDFWordsPrunerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/encoders/TextValueEncoderTest.java
        • /mahout/trunk/examples/pom.xml
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/NewsgroupHelper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/LuceneStorageConfiguration.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/MailArchivesClusteringAnalyzer.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromLuceneStorageDriver.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaAnalyzer.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/regex/AnalyzerTransformer.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/text/AbstractLuceneStorageTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/nlp/collocations/llr/BloomTokenFilterTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfoTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/DriverTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/LuceneIterableTest.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/AsyncEigenVerifier.java
        • /mahout/trunk/math/src/test/java/org/apache/mahout/math/MahoutTestCase.java
        • /mahout/trunk/math/src/test/java/org/apache/mahout/math/decomposer/hebbian/TestHebbianSolver.java
        Show
        Hudson added a comment - FAILURE: Integrated in Mahout-Quality #2345 (See https://builds.apache.org/job/Mahout-Quality/2345/ ) MAHOUT-1345 : Enable randomised testing for all Mahout modules (smarthi: rev 1546827) /mahout/trunk/core/pom.xml /mahout/trunk/pom.xml MAHOUT-1345 : Enable randomised testing for all Mahout modules (smarthi: rev 1546826) /mahout/trunk/integration/pom.xml /mahout/trunk/math/pom.xml MAHOUT-1345 : Enable randomised testing for all Mahout modules (smarthi: rev 1546825) /mahout/trunk/.gitignore /mahout/trunk/CHANGELOG /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/eval/AbstractDifferenceRecommenderEvaluator.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/lucene/AnalyzerUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/ep/EvolutionaryProcess.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/recommender/svd/ALSWRFactorizerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/recommender/svd/ParallelSGDFactorizerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegressionTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/ModelSerializerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/ep/EvolutionaryProcessTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/HighDFWordsPrunerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/encoders/TextValueEncoderTest.java /mahout/trunk/examples/pom.xml /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/NewsgroupHelper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/LuceneStorageConfiguration.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/MailArchivesClusteringAnalyzer.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromLuceneStorageDriver.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaAnalyzer.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/regex/AnalyzerTransformer.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java /mahout/trunk/integration/src/test/java/org/apache/mahout/text/AbstractLuceneStorageTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/nlp/collocations/llr/BloomTokenFilterTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfoTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/DriverTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/LuceneIterableTest.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/AsyncEigenVerifier.java /mahout/trunk/math/src/test/java/org/apache/mahout/math/MahoutTestCase.java /mahout/trunk/math/src/test/java/org/apache/mahout/math/decomposer/hebbian/TestHebbianSolver.java
        Hide
        Suneel Marthi added a comment -

        Following test failed in Hudson build

        
        Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.838 sec <<< FAILURE! - in org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest
        testOmegaTRightMultiply(org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest)  Time elapsed: 0.192 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 5 threads leaked from TEST scope at testOmegaTRightMultiply(org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest): 
           1) Thread[id=108, name=pool-8-thread-8, state=TERMINATED, group={null group}]
                at (empty stack)
        	at __randomizedtesting.SeedInfo.seed([784E5A6F09573D1:1727E47CFE1A98B7]:0)
        
        

        Don't see the failure happen locally on my machine (Mac OS), will look at it later today. Not sure if this happens on a Ubuntu instance.

        Show
        Suneel Marthi added a comment - Following test failed in Hudson build Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.838 sec <<< FAILURE! - in org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest testOmegaTRightMultiply(org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest) Time elapsed: 0.192 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 5 threads leaked from TEST scope at testOmegaTRightMultiply(org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest): 1) Thread [id=108, name=pool-8-thread-8, state=TERMINATED, group={ null group}] at (empty stack) at __randomizedtesting.SeedInfo.seed([784E5A6F09573D1:1727E47CFE1A98B7]:0) Don't see the failure happen locally on my machine (Mac OS), will look at it later today. Not sure if this happens on a Ubuntu instance.
        Hide
        Dawid Weiss added a comment -
        Thread[id=108, name=pool-8-thread-8, state=TERMINATED, group={null group}]
        

        Probably a slower system – the thread is in TERMINATED state but not dead yet. Add some lingering to MahoutTestCase (it also configures other stuff):

        /**
         * Superclass of all Mahout test cases.
         */
        @ThreadLeakScope(Scope.SUITE)
        @ThreadLeakAction({Action.WARN, Action.INTERRUPT})
        @ThreadLeakLingering(linger = 20000) // Wait a bit longer for leaked threads to die.
        @ThreadLeakZombies(Consequence.IGNORE_REMAINING_TESTS)
        @TimeoutSuite(millis = 2 * TimeUnits.HOUR)
        public abstract class MahoutTestCase extends RandomizedTest {
        ...
        
        Show
        Dawid Weiss added a comment - Thread [id=108, name=pool-8-thread-8, state=TERMINATED, group={ null group}] Probably a slower system – the thread is in TERMINATED state but not dead yet. Add some lingering to MahoutTestCase (it also configures other stuff): /** * Superclass of all Mahout test cases. */ @ThreadLeakScope(Scope.SUITE) @ThreadLeakAction({Action.WARN, Action.INTERRUPT}) @ThreadLeakLingering(linger = 20000) // Wait a bit longer for leaked threads to die. @ThreadLeakZombies(Consequence.IGNORE_REMAINING_TESTS) @TimeoutSuite(millis = 2 * TimeUnits.HOUR) public abstract class MahoutTestCase extends RandomizedTest { ...
        Hide
        Suneel Marthi added a comment -

        Thanks Dawid Weiss. It does seem like a slow system, the most recent Hudson build was successful. Nevertheless will add the annotations from your previous comment to MahoutTestCase to prevent this from happening in future.

        Show
        Suneel Marthi added a comment - Thanks Dawid Weiss . It does seem like a slow system, the most recent Hudson build was successful. Nevertheless will add the annotations from your previous comment to MahoutTestCase to prevent this from happening in future.
        Hide
        Suneel Marthi added a comment -

        Dawid Weiss Added the annotations per MahoutTestCase as suggested by you. There was another random failure reported by Hudson :

        
        Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.935 sec <<< FAILURE! - in org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest
        toyExampleImplicit(org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest)  Time elapsed: 0.114 sec  <<< ERROR!
        com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at toyExampleImplicit(org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest): 
           1) Thread[id=49, name=pool-10-thread-4, state=RUNNABLE, group=TGRP-ALSWRFactorizerTest]
                at sun.misc.Unsafe.unpark(Native Method)
                at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:122)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:640)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1242)
                at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:431)
                at java.util.concurrent.ThreadPoolExecutor.workerDone(ThreadPoolExecutor.java:1023)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:922)
                at java.lang.Thread.run(Thread.java:662)
        	at __randomizedtesting.SeedInfo.seed([A5759B90EF18631F:5EA19EC521877CAD]:0)
        
        

        Hopefully, the fix suggested by you should take care of all these random failures.

        Show
        Suneel Marthi added a comment - Dawid Weiss Added the annotations per MahoutTestCase as suggested by you. There was another random failure reported by Hudson : Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.935 sec <<< FAILURE! - in org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest toyExampleImplicit(org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest) Time elapsed: 0.114 sec <<< ERROR! com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at toyExampleImplicit(org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizerTest): 1) Thread [id=49, name=pool-10-thread-4, state=RUNNABLE, group=TGRP-ALSWRFactorizerTest] at sun.misc.Unsafe.unpark(Native Method) at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:122) at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:640) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1242) at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:431) at java.util.concurrent.ThreadPoolExecutor.workerDone(ThreadPoolExecutor.java:1023) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:922) at java.lang. Thread .run( Thread .java:662) at __randomizedtesting.SeedInfo.seed([A5759B90EF18631F:5EA19EC521877CAD]:0) Hopefully, the fix suggested by you should take care of all these random failures.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2347 (See https://builds.apache.org/job/Mahout-Quality/2347/)
        MAHOUT-1345: Added Carrot Randomized test annotations to base MahoutTestCase to avoid random Test failures (as suggested by Dawid Weiss). (smarthi: rev 1546869)

        • /mahout/trunk/CHANGELOG
        • /mahout/trunk/math/pom.xml
        • /mahout/trunk/math/src/test/java/org/apache/mahout/math/MahoutTestCase.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2347 (See https://builds.apache.org/job/Mahout-Quality/2347/ ) MAHOUT-1345 : Added Carrot Randomized test annotations to base MahoutTestCase to avoid random Test failures (as suggested by Dawid Weiss). (smarthi: rev 1546869) /mahout/trunk/CHANGELOG /mahout/trunk/math/pom.xml /mahout/trunk/math/src/test/java/org/apache/mahout/math/MahoutTestCase.java
        Hide
        Suneel Marthi added a comment -

        The last fix did it. Thanks Dawid. We are good now.

        Show
        Suneel Marthi added a comment - The last fix did it. Thanks Dawid. We are good now.

          People

          • Assignee:
            Suneel Marthi
            Reporter:
            Isabel Drost-Fromm
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development