Interesting. this gets back to the caching checkpoints: caching checkpoints is a way to optimize for common computational paths in DAGs.
so the only way to implement checkpoints in Flink (other than with cacheHint = NONE) is therefore always dumping stuff to DFS? If yes, this is a serious limitation as it prevents the most effective form of caching, i.e., when the object trees themselves are used. Even if hdfs checkpoint hits memory cache, we still need to spend time serializing and deserializing partitions back for every new tiny bit of computation (such as taking mean, or sum, or reducing-rebroadcasting intermediate statistics). This loop is very common for algorithms running till convergence. If we have a heavyweight scheduling in these type of systems inside the loop (as opposed to one-time scheduling outside the loop in superstep systems), it is already bad enough. If we on top of that need to serialize and deserialize when we run 50 conversion iterations, this is pretty disastrous.
So there's absolutely no way to keep datasets in object trees inside the worker vms between computations?