Currently, when we deploy a session cluster on Yarn/K8s and submit a job into the existing cluster, some pending pods/containers may be created due to no enough resource. Even the job will fail with slot allocation timeout or be canceled, the pending pods/containers will still be there. Until allocated and launched, they could be released via TaskManager idle timeout.
This behavior how to release the pending pods/containers could be improved. Once the pending slots changed in the SlotManager, it could notify the ActiveResourceManager to do some corresponding actions(e.g. release the needless pending pods). This will help a lot when the cluster is small and do not have too much available resources.