Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-4237

Testing Infrastructure Random Failures

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Network partitioning can cause brian-split case where there are two leaders exist. We need some sort of testing Infrastructure/framework to simulate such case and verify whether our SCM HA implementation can achieve strong consistency under partitioned network.

      There might be two ways suggested by Mukul Kumar Singh:

      a) Blockade tests, blockade is a docker based framework where the
      network for one DN can be isolated from the other

      b) MiniOzoneChaosCluster - This is a unit test based test, where a
      random datanode was killed and this helped in finding out issues with
      the consistency.

      We might need similar solution for SCM: block SCM leader network and also increase timeout to make old leader do not turn into candidate.

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:

                Issue deployment