Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2

Support Gang Scheduling

    XMLWordPrintableJSON

Details

    Description

      Gang scheduling is one of the most important features for schedulers, it is very useful for machine learning workloads such as tensorflow, pytorch, etc. Since yunikorn has notion of the application, it is not very hard for us to support gang scheduling context.

      This umbrella tracks the efforts to support gang scheduling feature in yunikorn.

      Attachments

        Issue Links

          1.
          Add TaskGroup field in app-CRD Sub-task Closed TingYao Huang
          2.
          Define app gang scheduling info in API package Sub-task Closed Weiwei Yang
          3.
          Parse taskGroup info from pod annotation Sub-task Closed TingYao Huang
          4.
          Create a docker image to simulate app that requires gang scheduling Sub-task Closed TingYao Huang
          5.
          Fix application CRD definition Sub-task Closed TingYao Huang
          6.
          Add taskGroup info to apps/tasks Sub-task Closed Weiwei Yang
          7.
          Retrieve app taskGroups from pod spec Sub-task Closed TingYao Huang
          8.
          Implement the placeholder cleanup in PlaceholderManager Sub-task Closed TingYao Huang
          9.
          Handle app reservation timeout Sub-task Closed Kinga Marton
          10.
          Implement recycling service in PlaceholderManager that cleans up orphan placeholders Sub-task Closed TingYao Huang
          11.
          Implement placeholder swapping in the shim side Sub-task Closed TingYao Huang
          12.
          Implement container swapping logic in the scheduler core Sub-task Closed Wilfred Spiegelenburg
          13.
          handle placeholder pod recovery Sub-task Closed Wilfred Spiegelenburg
          14.
          Make sure placeholder/taskGroupName are passed back to the core Sub-task Closed Weiwei Yang
          15.
          Include node-selector and tolerations in the placeholder's pod spec Sub-task Closed Weiwei Yang
          16.
          Improve UT coverage on core side after YUNIKORN-476 Sub-task Closed Wilfred Spiegelenburg
          17.
          Dependency update after YUNIKORN-476 Sub-task Closed Weiwei Yang
          18.
          Simplify the gangDeploy.sh Sub-task Closed TingYao Huang
          19.
          Remove the sleep in placeholder manager stop function Sub-task Closed TingYao Huang
          20.
          Remove some useless log messages Sub-task Closed Weiwei Yang
          21.
          Cleanup placeholders when the app is Completed Sub-task Closed Kinga Marton
          22.
          Placeholder manager failed to init during scheduler recovery Sub-task Closed Weiwei Yang
          23.
          Gang scheduling waits indefinitely for placeholder pod allocation even where is no quota left in the queue Sub-task Closed Manikandan R
          24.
          Nil pointer exception while getting both termination and delete pod event Sub-task Closed Weiwei Yang
          25.
          Scheduler is unable to recovery from a restart Sub-task Closed Weiwei Yang
          26.
          Placeholder pods must be running as a non-root user Sub-task Closed Weiwei Yang
          27.
          Support add labels/annotations to taskGroup Sub-task Closed Weiwei Yang
          28.
          Skip creating placeholders for the completed apps post restart Sub-task Closed Weiwei Yang
          29.
          Handle release allocation/ask events under Failing state Sub-task Closed Kinga Marton
          30.
          Wait for placeholder cleanup Sub-task Closed Kinga Marton

          Activity

            People

              wwei Weiwei Yang
              wwei Weiwei Yang
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: