Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Network partitioning can cause brian-split case where there are two leaders exist. We need some sort of testing Infrastructure/framework to simulate such case and verify whether our SCM HA implementation can achieve strong consistency under partitioned network.
There might be two ways suggested by Mukul Kumar Singh:
a) Blockade tests, blockade is a docker based framework where the
network for one DN can be isolated from the other
b) MiniOzoneChaosCluster - This is a unit test based test, where a
random datanode was killed and this helped in finding out issues with
the consistency.
We might need similar solution for SCM: block SCM leader network and also increase timeout to make old leader do not turn into candidate.
Attachments
Issue Links
- relates to
-
HDDS-2720 Ozone Failure injection Service
- Resolved