Most of Flink's integration tests that execute full Flink programs and check their results are implemented by writing results to temporary output file and comparing the content of the file to a provided set of expected Strings. Flink's test utils make this quite comfortable and hide a lot of the complexity of this approach. Nonetheless, this approach has a few drawbacks:
- increased latency by going through disk
- comparison is on String representation of objects
- depends on the file system
Since Flink's collect() feature was added, the temp file approach is not the best approach anymore. Instead, tests can collect the result of a Flink program directly as objects and compare these against a set of expected objects.
It would be good to migrate the existing test base to use collect() instead of temporary output files.