Affects Version/s: None
Fix Version/s: None
I am the author of Failify, a test framework for end-to-end testing of distributed systems. Failify can be used to deterministically inject failures during a normal test case execution. Currently, node failure, network partition, network delay, network packet loss, and clock drift is supported. For a few supported languages (right now, Java and Scala), it is possible to enforce a specific order between nodes in order to reproduce a specific time-sensitive scenario and inject failures before or after a specific method is called when a specific stack trace is present. You can find more information in https://failify.io.
My reasons for Failify being useful to Kafka are:
- It is Docker-based and less messy and you can run the test cases in a single node and in parallel (there are plans for implementing the ability of deploying the same test case on a K8S or a Swarm cluster).
- It is Docker-based so you can easily have test cases that run on different OSes. Also, you can define the services you depend on e.g. ZK as another node in your deployment definition.
- The failure kinds supported are a superset of what is being supported now by Trogdor (in particular, Network delay and loss, clock drift and a more sophisticated network partitioning)
- There will be more control over when a failure should be introduced in a test case.
- You can write your test cases in Java or Scala or any other language that can be run on JVM and can use Java libraries.
- It can be easily integrated into your build pipeline as you will be writing your regular JUnit test cases.
- The API is compact and intuitive and there is a good documentation for the tool
Please let me know if you want to give it a try.