Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.0, 6.0
    • Component/s: general/test
    • Labels:
      None
    • Environment:
    • Lucene Fields:
      New, Patch Available

      Description

      The tests labelled @Monster are having a very hard time passing. Initially, there were some real failures on tests that push the 2 billion envelope, as well as some test bugs regarding suite timeouts and the quantity of date sent to sysout. Michael McCandless committed a fix for the real failures and the initial sysout problems.

      Trying to move @SuppressSysoutChecks to the definition of @Monster didn't work. I'm also having trouble defining the suite timeouts. One test that says it takes about 45 minutes hit the six hour timeout that I had configured for the suite, and failed.

      What sort of machine should we use as a "normal" machine for deciding on suite timeouts? My machine is over four years old, but it's got three 2.5Ghz cores and 12GB of RAM. That's certainly not a completely modern machine, but it's not exactly slow.

      1. LUCENE-6002.patch
        12 kB
        Shawn Heisey
      2. LUCENE-6002.patch
        13 kB
        Shawn Heisey
      3. LUCENE-6002.patch
        13 kB
        Shawn Heisey
      4. LUCENE-6002.patch
        7 kB
        Shawn Heisey
      5. LUCENE-6002.patch
        7 kB
        Shawn Heisey

        Issue Links

          Activity

          Hide
          Shawn Heisey added a comment -

          I think my first order of business would be to figure out how to successfully add SuppressSysoutChecks and a large suite timeout to Monster itself so that these items do not need to be configured on the individual tests. I added @SuppressSysoutChecks with a bugurl string immediately before the "public @interface Monster" line and had a test tagged @Monster fail, saying that there was too much output on sysout.

          Dawid Weiss, I think you're our resident expert on the test infrastructure. Is there something I might have done wrong when adding an annotation to Monster?

          Show
          Shawn Heisey added a comment - I think my first order of business would be to figure out how to successfully add SuppressSysoutChecks and a large suite timeout to Monster itself so that these items do not need to be configured on the individual tests. I added @SuppressSysoutChecks with a bugurl string immediately before the "public @interface Monster" line and had a test tagged @Monster fail, saying that there was too much output on sysout. Dawid Weiss , I think you're our resident expert on the test infrastructure. Is there something I might have done wrong when adding an annotation to Monster?
          Hide
          Ryan Ernst added a comment -

          If you want to ignore sysout checks for all monster tests, why not add checking for the monster annotation in the same place the SupressSysoutChecks annotation is checked (TestRuleLimitSysouts)? Then you won't need a fake bugurl. Also, I think the reason it doesn't work is the SupressSysoutChecks annotation you are adding is on the Monstor annotation itself. So if you wanted to make that work, i think you would have to check all the annotations recursively for supress sysout, but I don't think we want to do that.

          Show
          Ryan Ernst added a comment - If you want to ignore sysout checks for all monster tests, why not add checking for the monster annotation in the same place the SupressSysoutChecks annotation is checked (TestRuleLimitSysouts)? Then you won't need a fake bugurl. Also, I think the reason it doesn't work is the SupressSysoutChecks annotation you are adding is on the Monstor annotation itself. So if you wanted to make that work, i think you would have to check all the annotations recursively for supress sysout, but I don't think we want to do that.
          Hide
          Dawid Weiss added a comment -

          SupressSysoutChecks is an annotation applied to test classes, not annotations. It's not a meta-annotation, in other words (it's an interesting idea to support this scenario, though).

          Like Ryan said – if you want to disable sysout checks on a selected group of tests, include the code to do so in TestRuleLimitSysouts directly. There's really not that much magic going on here.

          Show
          Dawid Weiss added a comment - SupressSysoutChecks is an annotation applied to test classes, not annotations. It's not a meta-annotation, in other words (it's an interesting idea to support this scenario, though). Like Ryan said – if you want to disable sysout checks on a selected group of tests, include the code to do so in TestRuleLimitSysouts directly. There's really not that much magic going on here.
          Hide
          Shawn Heisey added a comment -

          I think I actually figured out how to modify TestRuleLimitSysouts. I've got isEnforced returning false if the Monster annotation is present.

          Is TimeoutSuite an annotation that I could apply to Monster? From what I can tell, this check is entirely in carrot2 code.

          Show
          Shawn Heisey added a comment - I think I actually figured out how to modify TestRuleLimitSysouts. I've got isEnforced returning false if the Monster annotation is present. Is TimeoutSuite an annotation that I could apply to Monster? From what I can tell, this check is entirely in carrot2 code.
          Hide
          Dawid Weiss added a comment -

          No. There are no "meta" support annotations. You need to apply them separately to each class annotated with Monster. Note that suite timeout applies to entire suites, not individual test cases. The easiest way to accomplish that would be to isolate a superclass MonsterTestCase from which all the monsters would inherit. Call it a mother-monster if you will. Then apply any annotation on that superclass (they are inherited).

          All annotations (and the test runner/ code) are part of the randomizedtesting project and have nothing to do whatsoever with what we do in the open source (Carrot2) or commercially (Carrot Search). Do not spread confusion, it stays on the internet .

          https://github.com/carrotsearch/randomizedtesting

          Show
          Dawid Weiss added a comment - No. There are no "meta" support annotations. You need to apply them separately to each class annotated with Monster. Note that suite timeout applies to entire suites, not individual test cases. The easiest way to accomplish that would be to isolate a superclass MonsterTestCase from which all the monsters would inherit. Call it a mother-monster if you will. Then apply any annotation on that superclass (they are inherited). All annotations (and the test runner/ code) are part of the randomizedtesting project and have nothing to do whatsoever with what we do in the open source (Carrot2) or commercially (Carrot Search). Do not spread confusion, it stays on the internet . https://github.com/carrotsearch/randomizedtesting
          Hide
          Dawid Weiss added a comment -

          I though about the meta-annotation concept a bit (so that annotations could be applied to interfaces, then applied). The problem with this is that then you hit multiple-inheritance problems which would have to be dealt with somehow. Now it's relatively easy because it follows Java rules for inheriting annotations from superclasses. I think I'll stick with this model.

          Show
          Dawid Weiss added a comment - I though about the meta-annotation concept a bit (so that annotations could be applied to interfaces, then applied). The problem with this is that then you hit multiple-inheritance problems which would have to be dealt with somehow. Now it's relatively easy because it follows Java rules for inheriting annotations from superclasses. I think I'll stick with this model.
          Hide
          Robert Muir added a comment -

          +1 to directly fix the monster tests. Also thanks for looking at them, as unfortunately I think they are very rarely run.

          My thoughts are: try harder to not depend on them completely for coverage of what they test, and avoid creating monster tests whenever possible. They have no continuous integration... I like the idea of monster tests that are doing "real" things, but they aren't and will never be unit tests, they are not fun to debug, and if nobody is running them then they are not helping.

          We should look at each monster test and think about alternative ways to test the scary parts, e.g. even if we can directly pass billions of things thru the codec API instead of actually indexing billions of docs, it might be fast enough for nightly... who knows.

          A good example is TestIndexWriterMaxDocs: it does in fact have a monster test that writes 2B docs. But this is kept very simple and only a sanity check really, because it has tons of other non-monster actual unit tests that e.g. change IW's limit and other tricks to test the boundary condition.

          Show
          Robert Muir added a comment - +1 to directly fix the monster tests. Also thanks for looking at them, as unfortunately I think they are very rarely run. My thoughts are: try harder to not depend on them completely for coverage of what they test, and avoid creating monster tests whenever possible. They have no continuous integration... I like the idea of monster tests that are doing "real" things, but they aren't and will never be unit tests, they are not fun to debug, and if nobody is running them then they are not helping. We should look at each monster test and think about alternative ways to test the scary parts, e.g. even if we can directly pass billions of things thru the codec API instead of actually indexing billions of docs, it might be fast enough for nightly... who knows. A good example is TestIndexWriterMaxDocs: it does in fact have a monster test that writes 2B docs. But this is kept very simple and only a sanity check really, because it has tons of other non-monster actual unit tests that e.g. change IW's limit and other tricks to test the boundary condition.
          Hide
          Shawn Heisey added a comment - - edited

          With this patch applied, the following test commandline passed. I believe this hits all the @Monster tests:

          ant -Dtests.jvms=2 -Dtests.monster=true -Dtests.heapsize=5g -Dtests.class='*.Test2B*|.*TestIndexWriterMaxDocs' clean test | tee ~/monster-testlog.txt
          

          I may have gotten too specific in the info included in the test annotations, I'd like to know what others think.

          I'm trying the patch out on a larger suite of tests (monster, nightly, and weekly) using this commandline:

          ant -Dtests.jvms=2 -Dtests.heapsize=5g -Dtests.nightly=true -Dtests.weekly=true -Dtests.monster=true clean test | tee ~/b5x-testlog.txt
          

          It will be many hours before I know whether this works.

          Show
          Shawn Heisey added a comment - - edited With this patch applied, the following test commandline passed. I believe this hits all the @Monster tests: ant -Dtests.jvms=2 -Dtests.monster= true -Dtests.heapsize=5g -Dtests.class='*.Test2B*|.*TestIndexWriterMaxDocs' clean test | tee ~/monster-testlog.txt I may have gotten too specific in the info included in the test annotations, I'd like to know what others think. I'm trying the patch out on a larger suite of tests (monster, nightly, and weekly) using this commandline: ant -Dtests.jvms=2 -Dtests.heapsize=5g -Dtests.nightly= true -Dtests.weekly= true -Dtests.monster= true clean test | tee ~/b5x-testlog.txt It will be many hours before I know whether this works.
          Hide
          Dawid Weiss added a comment -
          Monster("takes ~20GB-30GB of space and 10 minutes, and more heap space sometimes")
          

          all these estimates are really hardware-dependent and may be off by an order of magnitude I think (just as you can see in the diff). I think SSD vs. regular spindle may be one factor, default JVM heap another one.... you can multiply.

          This data is valuable, it's just without a context it's no reference for anybody – you should at least include the machine specs from which these were measured.

          Show
          Dawid Weiss added a comment - Monster( "takes ~20GB-30GB of space and 10 minutes, and more heap space sometimes" ) all these estimates are really hardware-dependent and may be off by an order of magnitude I think (just as you can see in the diff). I think SSD vs. regular spindle may be one factor, default JVM heap another one.... you can multiply. This data is valuable, it's just without a context it's no reference for anybody – you should at least include the machine specs from which these were measured.
          Hide
          Robert Muir added a comment -

          +1 to commit this. We should look into the huge time discrepancies after that. Maybe some of the tests allow simpletext or some other slow one and should just be wired to the default codec. Maybe some of assertingcodecs checks are too costly.

          Show
          Robert Muir added a comment - +1 to commit this. We should look into the huge time discrepancies after that. Maybe some of the tests allow simpletext or some other slow one and should just be wired to the default codec. Maybe some of assertingcodecs checks are too costly.
          Hide
          Shawn Heisey added a comment - - edited

          Updated patch with comments about the specs of my test system.

          Robert's speculation about a slow codec being chosen by the randomization makes sense, it would explain why the person who initially added the @Monster notation had such a wildly different runtime than I'm seeing. I won't discount differences in hardware, though. This machine was built four years ago, and it wasn't top-shelf hardware even then. I was trying to hit that sweet spot of performance vs. cost.

          Show
          Shawn Heisey added a comment - - edited Updated patch with comments about the specs of my test system. Robert's speculation about a slow codec being chosen by the randomization makes sense, it would explain why the person who initially added the @Monster notation had such a wildly different runtime than I'm seeing. I won't discount differences in hardware, though. This machine was built four years ago, and it wasn't top-shelf hardware even then. I was trying to hit that sweet spot of performance vs. cost.
          Hide
          Shawn Heisey added a comment -

          If you can tell me how to limit the codecs, which codecs are the most important to test, and which tests should be limited in this way, I'll be happy to add that. I'm also fine with it being handled in another issue.

          Show
          Shawn Heisey added a comment - If you can tell me how to limit the codecs, which codecs are the most important to test, and which tests should be limited in this way, I'll be happy to add that. I'm also fine with it being handled in another issue.
          Hide
          Robert Muir added a comment -

          In my opinion these tests should supply the following to IndexWriterConfig:

          .setCodec(TestUtil.getDefaultCodec())
          
          Show
          Robert Muir added a comment - In my opinion these tests should supply the following to IndexWriterConfig: .setCodec(TestUtil.getDefaultCodec())
          Hide
          Robert Muir added a comment -

          I would also look for things like use of newField() <-- might introduce things like term vectors or other options that bloat space, newIndexWriterConfig() <-- might pick strangely inefficient IW options, etc. It would be good to see if any are using MockAnalyzer and think about how to disable payloads randomness there and so forth.

          In general, I don't think monster tests should be randomized. These are integration tests at best and should just test our defaults.

          Show
          Robert Muir added a comment - I would also look for things like use of newField() <-- might introduce things like term vectors or other options that bloat space, newIndexWriterConfig() <-- might pick strangely inefficient IW options, etc. It would be good to see if any are using MockAnalyzer and think about how to disable payloads randomness there and so forth. In general, I don't think monster tests should be randomized. These are integration tests at best and should just test our defaults.
          Hide
          Shawn Heisey added a comment -

          In my opinion these tests should supply the following to IndexWriterConfig:

          I will look deeper into the tests to see if I can figure out how to apply the knowledge you're imparting here and in your next comment. I'm not 100% ignorant of how Lucene works internally, but I do find myself drowning easily whenever I start looking at it.

          Show
          Shawn Heisey added a comment - In my opinion these tests should supply the following to IndexWriterConfig: I will look deeper into the tests to see if I can figure out how to apply the knowledge you're imparting here and in your next comment. I'm not 100% ignorant of how Lucene works internally, but I do find myself drowning easily whenever I start looking at it.
          Hide
          Robert Muir added a comment -

          These are just ideas for cleanup... but the current patch is good!

          Show
          Robert Muir added a comment - These are just ideas for cleanup... but the current patch is good!
          Hide
          Shawn Heisey added a comment -

          Another patch update. In all the places on Monster tests where it was obvious how to do it, the default codec is now selected for IndexWriterConfig.

          Show
          Shawn Heisey added a comment - Another patch update. In all the places on Monster tests where it was obvious how to do it, the default codec is now selected for IndexWriterConfig.
          Hide
          Dawid Weiss added a comment -

          > I won't discount differences in hardware, though. This machine was built four years ago, and it wasn't top-shelf hardware even then. I was trying to hit that sweet spot of performance vs. cost.

          All I'm saying is there's no mention of what hardware these figures in your comments come from – without such a point of reference I think they pretty useless (except for comparing relative cost). I'd add the test machine specs on Monster annotation's javadoc, for example. Is this a problem?

          Show
          Dawid Weiss added a comment - > I won't discount differences in hardware, though. This machine was built four years ago, and it wasn't top-shelf hardware even then. I was trying to hit that sweet spot of performance vs. cost. All I'm saying is there's no mention of what hardware these figures in your comments come from – without such a point of reference I think they pretty useless (except for comparing relative cost). I'd add the test machine specs on Monster annotation's javadoc, for example. Is this a problem?
          Hide
          Shawn Heisey added a comment -

          Updating the patch. It does have some comments about the hardware, near the Monster annotation in each place where a specific time is listed.

          I'm good with putting the info anywhere that makes sense. If it should be elsewhere, let me know.

          On my last test run with the changes in this patch, all Lucene tests passed with monster, nightly, and weekly enabled. I will review the log from that run to see whether the approximate times were changed by choosing the default codec.

          It said that 40 Solr tests failed on that last run. I have not yet compiled a list of reasons for those failures, but there was mention of running out of PermGen space in the stderr info right above the list of the first ten failed tests. I assume that wouldn't be a problem if I were to use Java 8.

          Because there are no Monster tests in Solr, I have to assume it's nightly or weekly. I will wait for Dawid's input on where to mention hardware specs, then commit the resulting patch. I'll open another issue for the Solr test failures.

          Michael McCandless contributed via commits r1629362, r1629363, r1629366 and r1629367. This issue didn't exist at the time.

          Show
          Shawn Heisey added a comment - Updating the patch. It does have some comments about the hardware, near the Monster annotation in each place where a specific time is listed. I'm good with putting the info anywhere that makes sense. If it should be elsewhere, let me know. On my last test run with the changes in this patch, all Lucene tests passed with monster, nightly, and weekly enabled. I will review the log from that run to see whether the approximate times were changed by choosing the default codec. It said that 40 Solr tests failed on that last run. I have not yet compiled a list of reasons for those failures, but there was mention of running out of PermGen space in the stderr info right above the list of the first ten failed tests. I assume that wouldn't be a problem if I were to use Java 8. Because there are no Monster tests in Solr, I have to assume it's nightly or weekly. I will wait for Dawid's input on where to mention hardware specs, then commit the resulting patch. I'll open another issue for the Solr test failures. Michael McCandless contributed via commits r1629362, r1629363, r1629366 and r1629367. This issue didn't exist at the time.
          Hide
          Shawn Heisey added a comment -

          Minor patch update, adjusting some time estimates.

          Show
          Shawn Heisey added a comment - Minor patch update, adjusting some time estimates.
          Hide
          Shawn Heisey added a comment -

          These are the top five slowest tests from the last run:

          [junit4:tophints] 23660.51s | org.apache.lucene.index.Test2BSortedDocValues
          [junit4:tophints] 20029.48s | org.apache.lucene.index.Test2BBinaryDocValues
          [junit4:tophints] 11658.54s | org.apache.lucene.index.Test2BTerms
          [junit4:tophints] 8377.50s | org.apache.lucene.index.TestIndexWriterMaxDocs
          [junit4:tophints] 7932.65s | org.apache.lucene.index.Test2BNumericDocValues
          
          Show
          Shawn Heisey added a comment - These are the top five slowest tests from the last run: [junit4:tophints] 23660.51s | org.apache.lucene.index.Test2BSortedDocValues [junit4:tophints] 20029.48s | org.apache.lucene.index.Test2BBinaryDocValues [junit4:tophints] 11658.54s | org.apache.lucene.index.Test2BTerms [junit4:tophints] 8377.50s | org.apache.lucene.index.TestIndexWriterMaxDocs [junit4:tophints] 7932.65s | org.apache.lucene.index.Test2BNumericDocValues
          Hide
          Dawid Weiss added a comment -

          Looks great, Shawn, thanks!

          Show
          Dawid Weiss added a comment - Looks great, Shawn, thanks!
          Hide
          ASF subversion and git services added a comment -

          Commit 1631290 from Shawn Heisey in branch 'dev/trunk'
          [ https://svn.apache.org/r1631290 ]

          LUCENE-6002: Fix monster tests so they pass.

          Show
          ASF subversion and git services added a comment - Commit 1631290 from Shawn Heisey in branch 'dev/trunk' [ https://svn.apache.org/r1631290 ] LUCENE-6002 : Fix monster tests so they pass.
          Hide
          ASF subversion and git services added a comment -

          Commit 1631291 from Shawn Heisey in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1631291 ]

          LUCENE-6002: Fix monster tests so they pass (backport r1631290 from trunk).

          Show
          ASF subversion and git services added a comment - Commit 1631291 from Shawn Heisey in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1631291 ] LUCENE-6002 : Fix monster tests so they pass (backport r1631290 from trunk).
          Hide
          Shawn Heisey added a comment -

          A second test run in lucene/ (weekly, nightly, and monster, two 5gb JVMs) passed. Took 634 minutes.

          Show
          Shawn Heisey added a comment - A second test run in lucene/ (weekly, nightly, and monster, two 5gb JVMs) passed. Took 634 minutes.
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Shawn Heisey
              Reporter:
              Shawn Heisey
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development