We should set up an end-to-end test which runs the general purpose job (
FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When running the job, the job failures should be activated.
Additionally, we should randomly kill Flink processes (cluster entrypoint and TaskExecutors). When killing them, we should also spawn new processes to make up for the loss.
This end-to-end test case should run with all different state backend settings: RocksDB (full/incremental, async/sync), FsStateBackend (sync/async)
We should then verify that the general purpose job is successfully recovered without data loss or other failures.