Affects Version/s: 0.10.0
Fix Version/s: 0.10.0
When testing 0.10 release candidate w/ large number of topic partitions and containers, we have observed that there is a serious stability issue when combining checkpoint and coordinator streams together.
The reasons being the following:
1) The current JobCoordinator's use case of coordinator includes the following:
- job configuration
- task changelog partition map
- container locality info
- misc (like migration marker, etc.)
Out of all the above, checkpoint creates the biggest problem:
1) It is much more dynamic than others. Trying to keep up-to-state w/ the coordinator stream's tail becomes impossible due to checkpointing from all containers. Mixing checkpoint w/ other message together in one stream makes it impossible to differentiate the cases whether there is more important configuration/status information that has to be read immediately versus there are just checkpoint updates in the coordinator stream.
2) In case of checkpoint, it is not necessary for JobCoordinator to be in the middle of the path. Our previous checkpoint model actually works properly: the checkpoint is written by the containers and read by the containers, and it is very clear that read only happens when recover/restart the container while write happens during the container runtime. Making all containers rely on the JobCoordinator to read the latest checkpoint actually makes JobCoordinator the single point bottleneck.
3) With the current change, removing CheckpointManagerFactory also disable the possibility for users to plugin their own checkpoint system.
Bottom line is: the write rate to coordinator stream should just be configuration and other low volume information. High-volume traffic should not be sent to coordinator stream, which is often used by JobCoordinator to build an up-to-date job status.
In addition, it would be preferred to move the checkpoint back before this release to avoid the unnecessary migration of checkpoint to coordinator stream.