Looking at the replication for some other reason I noticed that the replication source might be a bit too eager to remove sinks from the list of valid sinks.
The current logic allows a sink to fail N times (default 3) and then it will be remove from the sinks. But note that this failure count is never reduced, so given enough runtime and some network glitches every sink will eventually be removed. When all sink are removed the source pick new sinks and the counter is set to 0 for all of them.
I think we should change to reset the counter each time we successfully replicate something to the sink (which proves the sink isn't dead). Or we could decrease the counter each time we successfully replication, that might be better - if we consistently fail more attempts than we succeed the sink should be removed.