Solr
  1. Solr
  2. SOLR-8279

Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None
    1. SOLR-8279.patch
      43 kB
      Mark Miller
    2. SOLR-8279.patch
      45 kB
      Mark Miller
    3. SOLR-8279.patch
      40 kB
      Mark Miller
    4. SOLR-8279.patch
      39 kB
      Mark Miller
    5. SOLR-8279.patch
      6 kB
      Mark Miller
    6. SOLR-8279.patch
      6 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        Mark Miller added a comment -

        Started some work on a new test.

        Show
        Mark Miller added a comment - Started some work on a new test.
        Hide
        ASF subversion and git services added a comment -

        Commit 1714216 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1714216 ]

        SOLR-8279: Add a new SolrCloud test that stops and starts the cluster while indexing data.

        Show
        ASF subversion and git services added a comment - Commit 1714216 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1714216 ] SOLR-8279 : Add a new SolrCloud test that stops and starts the cluster while indexing data.
        Hide
        ASF subversion and git services added a comment -

        Commit 1714218 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1714218 ]

        SOLR-8279: Add a new SolrCloud test that stops and starts the cluster while indexing data.

        Show
        ASF subversion and git services added a comment - Commit 1714218 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1714218 ] SOLR-8279 : Add a new SolrCloud test that stops and starts the cluster while indexing data.
        Hide
        Mark Miller added a comment -

        I'm going to do some more work on this, but committing what I have for now.

        Show
        Mark Miller added a comment - I'm going to do some more work on this, but committing what I have for now.
        Hide
        Mike Drob added a comment - - edited

        + threads = new ArrayList<>(2);

        Should be ArrayList<>(numThreads);

        + thread.safeStop();
        + thread.safeStop();

        Typo, or some nuance here?

        + public void stopAndStartAllReplicas() throws Exception, InterruptedException {
        + chaosMonkey.stopAll(random().nextInt(2000));
        +
        + Thread.sleep(1000);
        +
        + chaosMonkey.startAll();
        + }

        Is sleeping for one second sufficient here? Do we want to instead sleep until some condition is met (like all the servers are fully down, in case there is a straggler)?

        Show
        Mike Drob added a comment - - edited + threads = new ArrayList<>(2); Should be ArrayList<>(numThreads); + thread.safeStop(); + thread.safeStop(); Typo, or some nuance here? + public void stopAndStartAllReplicas() throws Exception, InterruptedException { + chaosMonkey.stopAll(random().nextInt(2000)); + + Thread.sleep(1000); + + chaosMonkey.startAll(); + } Is sleeping for one second sufficient here? Do we want to instead sleep until some condition is met (like all the servers are fully down, in case there is a straggler)?
        Hide
        Mark Miller added a comment -

        This initial test has actually been committed already. I'm working a much beefed up version though, with additional fault injection to test various cluster restart issues.

        Show
        Mark Miller added a comment - This initial test has actually been committed already. I'm working a much beefed up version though, with additional fault injection to test various cluster restart issues.
        Hide
        Mark Miller added a comment -

        Here is my current state. Needs all kinds of cleanup and improvements and work before it could be committed, but already useful in using to find issues.

        Show
        Mark Miller added a comment - Here is my current state. Needs all kinds of cleanup and improvements and work before it could be committed, but already useful in using to find issues.
        Hide
        Mark Miller added a comment -

        This is much closer to committable. A few issues left to work out.

        I've added a TestInjection class that lets you turn on random test fault injection, currently when sending a doc to a replica or when shutting down. I've still got to make so that when you turn on a fault injection you can specify it's odds of hitting.

        I've also made it so that we can get away with not closing things in a valid test by have the ObjectReleaseTracker close unclosed objects at the end of the test if it's release check has been disabled on the test via annotation.

        Show
        Mark Miller added a comment - This is much closer to committable. A few issues left to work out. I've added a TestInjection class that lets you turn on random test fault injection, currently when sending a doc to a replica or when shutting down. I've still got to make so that when you turn on a fault injection you can specify it's odds of hitting. I've also made it so that we can get away with not closing things in a valid test by have the ObjectReleaseTracker close unclosed objects at the end of the test if it's release check has been disabled on the test via annotation.
        Hide
        Mark Miller added a comment -

        All the nocommits and such are out, I think this is pretty close, still need to go over it once more.

        Show
        Mark Miller added a comment - All the nocommits and such are out, I think this is pretty close, still need to go over it once more.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720613 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720613 ]

        SOLR-8279: Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

        Show
        ASF subversion and git services added a comment - Commit 1720613 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720613 ] SOLR-8279 : Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.
        Hide
        Mark Miller added a comment -

        There is the commit to trunk. Reviews welcome. This was a bit of a beast to get done in a way that could be run as part of the normal test framework, coming from my original just hack together a test I can run approach, but I think I've now got a great base for adding more failure / fault injection tests.

        Show
        Mark Miller added a comment - There is the commit to trunk. Reviews welcome. This was a bit of a beast to get done in a way that could be run as part of the normal test framework, coming from my original just hack together a test I can run approach, but I think I've now got a great base for adding more failure / fault injection tests.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720624 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720624 ]

        SOLR-8279: Close factories in unrelated test.

        Show
        ASF subversion and git services added a comment - Commit 1720624 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720624 ] SOLR-8279 : Close factories in unrelated test.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720627 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720627 ]

        SOLR-8279: end searcher tracking before object release tracker.

        Show
        ASF subversion and git services added a comment - Commit 1720627 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720627 ] SOLR-8279 : end searcher tracking before object release tracker.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720631 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720631 ]

        SOLR-8279: Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.

        Show
        ASF subversion and git services added a comment - Commit 1720631 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720631 ] SOLR-8279 : Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.
        Hide
        Mark Miller added a comment -

        I'll give Jenkins some time before backporting this to 5x.

        Show
        Mark Miller added a comment - I'll give Jenkins some time before backporting this to 5x.
        Hide
        Mark Miller added a comment -

        SOLR-8371 is just a really good improvement in general, but it also is useful for this fault injection testing. A lot of faults in this test when I first started working on it is how I refreshed on how bad SOLR-8371 was now - I always knew it was an issue, but the min time between recoveries that we put it in made it much worse.

        Show
        Mark Miller added a comment - SOLR-8371 is just a really good improvement in general, but it also is useful for this fault injection testing. A lot of faults in this test when I first started working on it is how I refreshed on how bad SOLR-8371 was now - I always knew it was an issue, but the min time between recoveries that we put it in made it much worse.
        Hide
        ASF subversion and git services added a comment -

        Commit 1720841 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720841 ]

        SOLR-8279: One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

        Show
        ASF subversion and git services added a comment - Commit 1720841 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720841 ] SOLR-8279 : One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.
        Hide
        Mark Miller added a comment -

        Okay, I think this is ready to go to 5x.

        Show
        Mark Miller added a comment - Okay, I think this is ready to go to 5x.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721935 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721935 ]

        SOLR-8279: Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

        Show
        ASF subversion and git services added a comment - Commit 1721935 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721935 ] SOLR-8279 : Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721936 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721936 ]

        SOLR-8279: Close factories in unrelated test.

        Show
        ASF subversion and git services added a comment - Commit 1721936 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721936 ] SOLR-8279 : Close factories in unrelated test.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721937 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721937 ]

        SOLR-8279: end searcher tracking before object release tracker.

        Show
        ASF subversion and git services added a comment - Commit 1721937 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721937 ] SOLR-8279 : end searcher tracking before object release tracker.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721938 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721938 ]

        SOLR-8279: Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.

        Show
        ASF subversion and git services added a comment - Commit 1721938 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721938 ] SOLR-8279 : Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724518 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1724518 ]

        SOLR-8279: One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

        Show
        ASF subversion and git services added a comment - Commit 1724518 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1724518 ] SOLR-8279 : One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development