Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2

Support Gang Scheduling

    XMLWordPrintableJSON

    Details

    • Target Version:

      Description

      Gang scheduling is one of the most important features for schedulers, it is very useful for machine learning workloads such as tensorflow, pytorch, etc. Since yunikorn has notion of the application, it is not very hard for us to support gang scheduling context.

      This umbrella tracks the efforts to support gang scheduling feature in yunikorn.

        Attachments

          Issue Links

          1.
          Add TaskGroup field in app-CRD Sub-task Resolved Ting Yao,Huang
          2.
          Define app gang scheduling info in API package Sub-task Resolved Weiwei Yang
          3.
          Parse taskGroup info from pod annotation Sub-task Resolved Ting Yao,Huang
          4.
          Create a docker image to simulate app that requires gang scheduling Sub-task Resolved Ting Yao,Huang
          5.
          Fix application CRD definition Sub-task Resolved Ting Yao,Huang
          6.
          Add taskGroup info to apps/tasks Sub-task Resolved Weiwei Yang
          7.
          Retrieve app taskGroups from pod spec Sub-task Resolved Ting Yao,Huang
          8.
          Implement the placeholder cleanup in PlaceholderManager Sub-task Resolved Ting Yao,Huang
          9.
          Handle app reservation timeout Sub-task Resolved Kinga Marton
          10.
          Implement recycling service in PlaceholderManager that cleans up orphan placeholders Sub-task Resolved Ting Yao,Huang
          11.
          Implement placeholder swapping in the shim side Sub-task Resolved Ting Yao,Huang
          12.
          Implement container swapping logic in the scheduler core Sub-task Resolved Wilfred Spiegelenburg
          13.
          handle placeholder pod recovery Sub-task Resolved Wilfred Spiegelenburg
          14.
          Make sure placeholder/taskGroupName are passed back to the core Sub-task Resolved Weiwei Yang
          15.
          Include node-selector and tolerations in the placeholder's pod spec Sub-task Resolved Weiwei Yang
          16.
          Improve UT coverage on core side after YUNIKORN-476 Sub-task Resolved Wilfred Spiegelenburg
          17.
          Dependency update after YUNIKORN-476 Sub-task Resolved Weiwei Yang
          18.
          Simplify the gangDeploy.sh Sub-task Resolved Ting Yao,Huang
          19.
          Remove the sleep in placeholder manager stop function Sub-task Resolved Ting Yao,Huang
          20.
          Remove some useless log messages Sub-task Resolved Weiwei Yang
          21.
          Cleanup placeholders when the app is Completed Sub-task Resolved Kinga Marton
          22.
          Placeholder manager failed to init during scheduler recovery Sub-task Resolved Weiwei Yang
          23.
          Gang scheduling waits indefinitely for placeholder pod allocation even where is no quota left in the queue Sub-task Resolved Manikandan R
          24.
          Nil pointer exception while getting both termination and delete pod event Sub-task Resolved Weiwei Yang
          25.
          Scheduler is unable to recovery from a restart Sub-task Resolved Weiwei Yang
          26.
          Placeholder pods must be running as a non-root user Sub-task Resolved Weiwei Yang
          27.
          Support add labels/annotations to taskGroup Sub-task Resolved Weiwei Yang
          28.
          Skip creating placeholders for the completed apps post restart Sub-task Resolved Weiwei Yang
          29.
          Handle release allocation/ask events under Failing state Sub-task Resolved Kinga Marton
          30.
          Wait for placeholder cleanup Sub-task Resolved Kinga Marton

            Activity

              People

              • Assignee:
                wwei Weiwei Yang
                Reporter:
                wwei Weiwei Yang
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: