In FLINK-10653, Zhijiang has introduced pluggable shuffle manager architecture which abstracts the process of data transfer between stages from flink runtime as shuffle service. Here I'd like to propose a common external shuffle service framework so that a majority of external shuffle services could be achieved more easily by compositing and wrapping this framework as well as implementing a few interfaces according to the specific platform or deployment system.
As far as I'm concerned, a common external shuffle service scenario:
(1) a shuffle service daemon process runs on each host machine as a server to provide shuffle data for remote(maybe local) consumers.
(2) a producer gets a local persistent output directory for writing shuffle data from the shuffle service daemon process of current host machine, and writes shuffle data afterwards.
(3) a consumer fetch its subpartition data from the shuffle service daemon on the host machine where the partition locates.
In my point of view, such framework could be applicable to external shuffle services such as YarnShuffleService, KubernetesShuffleService and StandaloneShuffleService. As to KubernetesShuffleService, there is also another plan, named as sidecar mode, to achieve shuffle service on k8s which puts a TM process and a shuffle service process into a pod. As a result, there might be multiple shuffle service daemons running on a host machine, it can still fit into the framework since the only difference might be whether the port of each shuffle service process is fixed or not. Accroding to Zhijiang's proposal, this case can be handled via UpdatePartitionInfo so that the actual port of each shuffle service process can be updated to the consumers.
This framework is not intended to handle external shuffle services which use global storages as the media for shuffle data, such as DfsShuffleService, or other implementations which don't request an actual shuffle service role such as RdmaShuffleService.