Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8279

Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

    Details

    • Type: Test
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None
    1. SOLR-8279.patch
      43 kB
      Mark Miller
    2. SOLR-8279.patch
      45 kB
      Mark Miller
    3. SOLR-8279.patch
      40 kB
      Mark Miller
    4. SOLR-8279.patch
      39 kB
      Mark Miller
    5. SOLR-8279.patch
      6 kB
      Mark Miller
    6. SOLR-8279.patch
      6 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Started some work on a new test.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Started some work on a new test.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1714216 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1714216 ]

        SOLR-8279: Add a new SolrCloud test that stops and starts the cluster while indexing data.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1714216 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1714216 ] SOLR-8279 : Add a new SolrCloud test that stops and starts the cluster while indexing data.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1714218 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1714218 ]

        SOLR-8279: Add a new SolrCloud test that stops and starts the cluster while indexing data.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1714218 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1714218 ] SOLR-8279 : Add a new SolrCloud test that stops and starts the cluster while indexing data.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        I'm going to do some more work on this, but committing what I have for now.

        Show
        markrmiller@gmail.com Mark Miller added a comment - I'm going to do some more work on this, but committing what I have for now.
        Hide
        mdrob Mike Drob added a comment - - edited

        + threads = new ArrayList<>(2);

        Should be ArrayList<>(numThreads);

        + thread.safeStop();
        + thread.safeStop();

        Typo, or some nuance here?

        + public void stopAndStartAllReplicas() throws Exception, InterruptedException {
        + chaosMonkey.stopAll(random().nextInt(2000));
        +
        + Thread.sleep(1000);
        +
        + chaosMonkey.startAll();
        + }

        Is sleeping for one second sufficient here? Do we want to instead sleep until some condition is met (like all the servers are fully down, in case there is a straggler)?

        Show
        mdrob Mike Drob added a comment - - edited + threads = new ArrayList<>(2); Should be ArrayList<>(numThreads); + thread.safeStop(); + thread.safeStop(); Typo, or some nuance here? + public void stopAndStartAllReplicas() throws Exception, InterruptedException { + chaosMonkey.stopAll(random().nextInt(2000)); + + Thread.sleep(1000); + + chaosMonkey.startAll(); + } Is sleeping for one second sufficient here? Do we want to instead sleep until some condition is met (like all the servers are fully down, in case there is a straggler)?
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        This initial test has actually been committed already. I'm working a much beefed up version though, with additional fault injection to test various cluster restart issues.

        Show
        markrmiller@gmail.com Mark Miller added a comment - This initial test has actually been committed already. I'm working a much beefed up version though, with additional fault injection to test various cluster restart issues.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Here is my current state. Needs all kinds of cleanup and improvements and work before it could be committed, but already useful in using to find issues.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Here is my current state. Needs all kinds of cleanup and improvements and work before it could be committed, but already useful in using to find issues.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        This is much closer to committable. A few issues left to work out.

        I've added a TestInjection class that lets you turn on random test fault injection, currently when sending a doc to a replica or when shutting down. I've still got to make so that when you turn on a fault injection you can specify it's odds of hitting.

        I've also made it so that we can get away with not closing things in a valid test by have the ObjectReleaseTracker close unclosed objects at the end of the test if it's release check has been disabled on the test via annotation.

        Show
        markrmiller@gmail.com Mark Miller added a comment - This is much closer to committable. A few issues left to work out. I've added a TestInjection class that lets you turn on random test fault injection, currently when sending a doc to a replica or when shutting down. I've still got to make so that when you turn on a fault injection you can specify it's odds of hitting. I've also made it so that we can get away with not closing things in a valid test by have the ObjectReleaseTracker close unclosed objects at the end of the test if it's release check has been disabled on the test via annotation.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        All the nocommits and such are out, I think this is pretty close, still need to go over it once more.

        Show
        markrmiller@gmail.com Mark Miller added a comment - All the nocommits and such are out, I think this is pretty close, still need to go over it once more.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1720613 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720613 ]

        SOLR-8279: Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1720613 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720613 ] SOLR-8279 : Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        There is the commit to trunk. Reviews welcome. This was a bit of a beast to get done in a way that could be run as part of the normal test framework, coming from my original just hack together a test I can run approach, but I think I've now got a great base for adding more failure / fault injection tests.

        Show
        markrmiller@gmail.com Mark Miller added a comment - There is the commit to trunk. Reviews welcome. This was a bit of a beast to get done in a way that could be run as part of the normal test framework, coming from my original just hack together a test I can run approach, but I think I've now got a great base for adding more failure / fault injection tests.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1720624 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720624 ]

        SOLR-8279: Close factories in unrelated test.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1720624 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720624 ] SOLR-8279 : Close factories in unrelated test.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1720627 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720627 ]

        SOLR-8279: end searcher tracking before object release tracker.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1720627 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720627 ] SOLR-8279 : end searcher tracking before object release tracker.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1720631 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720631 ]

        SOLR-8279: Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1720631 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720631 ] SOLR-8279 : Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        I'll give Jenkins some time before backporting this to 5x.

        Show
        markrmiller@gmail.com Mark Miller added a comment - I'll give Jenkins some time before backporting this to 5x.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        SOLR-8371 is just a really good improvement in general, but it also is useful for this fault injection testing. A lot of faults in this test when I first started working on it is how I refreshed on how bad SOLR-8371 was now - I always knew it was an issue, but the min time between recoveries that we put it in made it much worse.

        Show
        markrmiller@gmail.com Mark Miller added a comment - SOLR-8371 is just a really good improvement in general, but it also is useful for this fault injection testing. A lot of faults in this test when I first started working on it is how I refreshed on how bad SOLR-8371 was now - I always knew it was an issue, but the min time between recoveries that we put it in made it much worse.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1720841 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1720841 ]

        SOLR-8279: One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1720841 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1720841 ] SOLR-8279 : One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Okay, I think this is ready to go to 5x.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Okay, I think this is ready to go to 5x.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1721935 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721935 ]

        SOLR-8279: Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1721935 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721935 ] SOLR-8279 : Add a new test fault injection approach and a new SolrCloud test that stops and starts the cluster while indexing data and with random faults.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1721936 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721936 ]

        SOLR-8279: Close factories in unrelated test.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1721936 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721936 ] SOLR-8279 : Close factories in unrelated test.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1721937 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721937 ]

        SOLR-8279: end searcher tracking before object release tracker.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1721937 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721937 ] SOLR-8279 : end searcher tracking before object release tracker.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1721938 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721938 ]

        SOLR-8279: Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1721938 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721938 ] SOLR-8279 : Do not fail tests due to searcher tracking - just use that for waiting and use ObjectReleaseTracker for the fail since it has more detailed info.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1724518 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1724518 ]

        SOLR-8279: One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1724518 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1724518 ] SOLR-8279 : One of two tests was not calling TestInjection#clear after using it. Call clear in the Solr base test class instead.

          People

          • Assignee:
            markrmiller@gmail.com Mark Miller
            Reporter:
            markrmiller@gmail.com Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development