Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11455

change SoftAutoCommitTest to check when events *start* -- not when they finish

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      AFAICT, most "slow machine" related (jenkins) failures in SoftAutoCommitTest seem to be exacerbated by the fact that the SolrEventListener model fires an effent when the action (softCommit, hardCommit, newSearcher) finishes successfully – w/o paying any attention to when it starts.

      We should consider taking a more white/grey box approach to checking both the start & finish of these events – and moving our autoCommit timing expectations to the "start" of the commits, with much more relaxed (ie: many seconds) checks that the event finished successfully.

      Attachments

        1. SoftAutoCommitTest.jenkins.1398.txt
          16.63 MB
          Chris M. Hostetter

        Issue Links

          Activity

            some of the comments in SOLR-11454 are an example of where tracking the (newSearcher) event finish results in a failure but tracking the even start would show it happened very quickly.

            The attached jenkins log (SoftAutoCommitTest.jenkins.1398.txt) is another example...
            https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1398/

            Error Message:
            softCommit: did not find a single commit
            
            Stack Trace:
            java.lang.AssertionError: softCommit: did not find a single commit
                    at __randomizedtesting.SeedInfo.seed([CCAC87827208491B:90B929BB998A0863]:0)
                    at org.junit.Assert.fail(Assert.java:93)
                    at org.junit.Assert.assertTrue(Assert.java:43)
                    at org.junit.Assert.assertNotNull(Assert.java:526)
                    at org.apache.solr.update.SoftAutoCommitTest.assertRapidMultiCommitQueues(SoftAutoCommitTest.java:475)
                    at org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds(SoftAutoCommitTest.java:440)
            

            After some rapid fire adds, the test does a loop looking for an expected# of commits based on the relative pre/post timestamp of the adds (ie: at least 1 softCommit, maybemore more if the total time to add the docs exceeded the autoCommitTime) .. but in this vase it fails because it didn't dedect a softCommit finish in the expected wait time.

            But according to the logs, the autoSoftCommit started exactly when expected (508ms after the first doc was added, or 8ms after the autoSoftCommitMillis=500 said it should) ...

               [junit4]   2> 2487332 INFO  (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[CCAC87827208491B]) [    ] o.a.s.u.p.LogUpdateProcessorFactory [collection1]  webapp=null path=null params={}{add=[5000 (1580608950822240256)]} 0 28
            ...
               [junit4]   2> 2487840 INFO  (commitScheduler-2095-thread-1) [    ] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
            

            ...it just didn't finish as quickly as the monitor polling expected it to.

            hossman Chris M. Hostetter added a comment - some of the comments in SOLR-11454 are an example of where tracking the (newSearcher) event finish results in a failure but tracking the even start would show it happened very quickly. The attached jenkins log (SoftAutoCommitTest.jenkins.1398.txt) is another example... https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1398/ Error Message: softCommit: did not find a single commit Stack Trace: java.lang.AssertionError: softCommit: did not find a single commit at __randomizedtesting.SeedInfo.seed([CCAC87827208491B:90B929BB998A0863]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.update.SoftAutoCommitTest.assertRapidMultiCommitQueues(SoftAutoCommitTest.java:475) at org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds(SoftAutoCommitTest.java:440) After some rapid fire adds, the test does a loop looking for an expected# of commits based on the relative pre/post timestamp of the adds (ie: at least 1 softCommit, maybemore more if the total time to add the docs exceeded the autoCommitTime) .. but in this vase it fails because it didn't dedect a softCommit finish in the expected wait time. But according to the logs, the autoSoftCommit started exactly when expected (508ms after the first doc was added, or 8ms after the autoSoftCommitMillis=500 said it should) ... [junit4] 2> 2487332 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[CCAC87827208491B]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5000 (1580608950822240256)]} 0 28 ... [junit4] 2> 2487840 INFO (commitScheduler-2095-thread-1) [ ] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} ...it just didn't finish as quickly as the monitor polling expected it to.

            One possibility would be to make this test use a subclass of DUH2 that called special methods on our MockEventListener when the UpdateHandler.commit(...) method is called – prior to calling super.commit(...) this could be used to add timestamps to new BlockingQueue<Long> preSoft and BlockingQueue<Long> preHard which could be checked with timing expectations equal to the existing checks on the existing soft/hard queues, and new more relaxed checks could be used to ensure that these commits do in fact eventually finish (and trigger the normal EventListener methods)

            hossman Chris M. Hostetter added a comment - One possibility would be to make this test use a subclass of DUH2 that called special methods on our MockEventListener when the UpdateHandler.commit(...) method is called – prior to calling super.commit(...) this could be used to add timestamps to new BlockingQueue<Long> preSoft and BlockingQueue<Long> preHard which could be checked with timing expectations equal to the existing checks on the existing soft/hard queues, and new more relaxed checks could be used to ensure that these commits do in fact eventually finish (and trigger the normal EventListener methods)

            NOTE: I initially filed this as a sub-task of SOLR-11053, but in hindsight i think that was a mistake. It's a general improvement that doesn't relate to if/when SoftAutoCommitTest is better then (Hard)AutoCommitTest – those tests actually suffer fromt he exact same problem – so i've converted to a top level Improvement task

            hossman Chris M. Hostetter added a comment - NOTE: I initially filed this as a sub-task of SOLR-11053 , but in hindsight i think that was a mistake. It's a general improvement that doesn't relate to if/when SoftAutoCommitTest is better then (Hard)AutoCommitTest – those tests actually suffer fromt he exact same problem – so i've converted to a top level Improvement task

            People

              Unassigned Unassigned
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: