Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2233

Scheduler cannot be stopped properly

    XMLWordPrintableJSON

Details

    Description

      Once we have a Scheduler, RMProxy, HealthChecker, internalMetricsCollector, etc. objects initialized, there's no way to stop the background goroutines started by them. This isn't necessarily a problem in a real environment, because restarting the scheduler core on its own is not a requirement.

      However, in the tests which use MockScheduler, we don't want goroutines to keep running and consume memory. Therefore, we a need a proper Stop() method on the most relevant types to make sure that the stop signal propagates to all goroutines.

      Attached screenshot shows what happens after we call MockScheduler.Stop() in the core (it's very similar to the MockScheduler in the shim). Goroutines are still running from the following types:

      • Scheduler
      • nodesResourceUsageMonitor
      • HealthChecker
      • RMProxy
      • EventSystemImpl
      • partitionManager
      • internalMetricsCollector

      Similar happens inside the shim, although it's less problematic. KubernetesShim.Stop() needs to be improved, because two goroutines depends on "stopChan", but we send a message only once. It's much better to call close(ss.stopChan) which causes all reader to receive the stop signal.
      Also, small changes are necessary in the shim-side MockScheduler to initiate shutdown properly (right now, we don't call fc.coreContext.StopAll()).

      Attachments

        1. goroutines_core.png
          421 kB
          Peter Bacsko

        Activity

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: