In existing registry service implementation, purge operation triggered by container finish event:
Since this happens on every container finish, so it essentially scans all (or almost) ZK node from the root.
We have a cluster which have hundreds of ZK nodes for service registry, and have 20K+ ZK nodes for other purposes. The existing implementation could generate massive ZK operations and internal Java objects (RegistryPathStatus) as well. The RM becomes very unstable when there're batch container finish events because of full GC pause and ZK connection failure.