Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Once we have a Scheduler, RMProxy, HealthChecker, internalMetricsCollector, etc. objects initialized, there's no way to stop the background goroutines started by them. This isn't necessarily a problem in a real environment, because restarting the scheduler core on its own is not a requirement.
However, in the tests which use MockScheduler, we don't want goroutines to keep running and consume memory. Therefore, we a need a proper Stop() method on the most relevant types to make sure that the stop signal propagates to all goroutines.
Attached screenshot shows what happens after we call MockScheduler.Stop() in the core (it's very similar to the MockScheduler in the shim). Goroutines are still running from the following types:
- Scheduler
- nodesResourceUsageMonitor
- HealthChecker
- RMProxy
- EventSystemImpl
- partitionManager
- internalMetricsCollector
Similar happens inside the shim, although it's less problematic. KubernetesShim.Stop() needs to be improved, because two goroutines depends on "stopChan", but we send a message only once. It's much better to call close(ss.stopChan) which causes all reader to receive the stop signal.
Also, small changes are necessary in the shim-side MockScheduler to initiate shutdown properly (right now, we don't call fc.coreContext.StopAll()).