The goal is to provide Bigtop module maintainers with a set of set of util classes to help develop smoke tests able to simulate certain failures during smoke tests execution on a cluster.
Summary of what is provided in current patch.
Following failure types are supported now:
- Service stopped and restarted (on given set of nodes)
- Service killed with 'kill -9' and started back up (on given set of nodes)
- Node inbound/outbound connections are shut down and brought back up (via iptables).
System requirements to run smoke tests with failures.
- password-less (PKI-based) root ssh to all nodes in cluster being tested is assumed.
- for local tests, like ClusterFailuresTest, one should have password-less root ssh to localhost.
- env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to according private key file.
Further thoughts (not included in this patch)
- Bigtop test framework (failures part of it) doesn't need to know about cluster topology, as it simply executes set of SSH commands on remote hosts (whose addresses are provided by specific
module smoke test developer). But the actual tests do need to know about cluster topology to run sophisticated failure scenarios.