Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18278

SPIP: Support native submission of spark jobs to a kubernetes cluster

    Details

      Description

      A new Apache Spark sub-project that enables native support for submitting Spark applications to a kubernetes cluster. The submitted application runs in a driver executing on a kubernetes pod, and executors lifecycles are also managed as pods.

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'liyinan926' has created a pull request for this issue:
          https://github.com/apache/spark/pull/19717

          Show
          apachespark Apache Spark added a comment - User 'liyinan926' has created a pull request for this issue: https://github.com/apache/spark/pull/19717
          Hide
          apachespark Apache Spark added a comment -

          User 'foxish' has created a pull request for this issue:
          https://github.com/apache/spark/pull/19468

          Show
          apachespark Apache Spark added a comment - User 'foxish' has created a pull request for this issue: https://github.com/apache/spark/pull/19468
          Hide
          foxish Anirudh Ramanathan added a comment - - edited

          An update on the plan -
          2 days ago, we cut https://github.com/apache-spark-on-k8s/spark/releases/tag/v2.2.0-kubernetes-0.4.0 and updated our documentation.

          We are planning to use that tagged release as the basis of our PRs to apache/spark in the 2.3 timeframe.
          https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935 is the tentative plan that we have. To bring greater visibility (beyond discussions in our fork), I'm posting here too.

          It might be a good idea for us to now make sub-JIRAs and tracking the following individually:

          • Basic scheduler backend components
          • Spark Submit changes and basic k8s submission logic
          • More K8s specific submission steps
          • Dynamic allocation and shuffle service
          • CI and e2e Testing
          • Distributing docker images via the ASF process
          • Documentation

          Reynold Xin Felix Cheung Matei Zaharia

          Show
          foxish Anirudh Ramanathan added a comment - - edited An update on the plan - 2 days ago, we cut https://github.com/apache-spark-on-k8s/spark/releases/tag/v2.2.0-kubernetes-0.4.0 and updated our documentation. We are planning to use that tagged release as the basis of our PRs to apache/spark in the 2.3 timeframe. https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935 is the tentative plan that we have. To bring greater visibility (beyond discussions in our fork), I'm posting here too. It might be a good idea for us to now make sub-JIRAs and tracking the following individually: Basic scheduler backend components Spark Submit changes and basic k8s submission logic More K8s specific submission steps Dynamic allocation and shuffle service CI and e2e Testing Distributing docker images via the ASF process Documentation Reynold Xin Felix Cheung Matei Zaharia
          Hide
          foxish Anirudh Ramanathan added a comment -

          The plan is to support k8s features (beta/GA) in 2 latest versions of Kubernetes, the latest and one before it with each release of Spark. Currently, in our fork, we have support for k8s 1.7 and k8s 1.6 features. 1.8 adds priority and preemption for pods but they're still in alpha. When they go into beta (likely by k8s 1.9), we will definitely have support for it.

          Show
          foxish Anirudh Ramanathan added a comment - The plan is to support k8s features (beta/GA) in 2 latest versions of Kubernetes, the latest and one before it with each release of Spark. Currently, in our fork, we have support for k8s 1.7 and k8s 1.6 features. 1.8 adds priority and preemption for pods but they're still in alpha. When they go into beta (likely by k8s 1.9), we will definitely have support for it.
          Hide
          lgrcyanny Cyanny added a comment - - edited

          Does spark-on-k8s have any plan to support priority when submit applications to a kubernetes namespace? and application preemption?

          Show
          lgrcyanny Cyanny added a comment - - edited Does spark-on-k8s have any plan to support priority when submit applications to a kubernetes namespace? and application preemption?
          Hide
          foxish Anirudh Ramanathan added a comment -

          Revision 2 of the Design Proposal

          Show
          foxish Anirudh Ramanathan added a comment - Revision 2 of the Design Proposal
          Hide
          foxish Anirudh Ramanathan added a comment - - edited

          FYI: sandflee's question was addressed in https://github.com/apache-spark-on-k8s/spark/issues/270

          Our last published update is here: http://apache-spark-developers-list.1001551.n3.nabble.com/An-Update-on-Spark-on-Kubernetes-Jun-23-td21843.html
          In the next couple of weeks, we plan on submitting an official SPIP detailing our management of the fork, demonstrating uptake, and sustained maintainership. Our goal is to graduate beyond experimental support into an upstream contribution for Spark 2.3.

          Reynold Xin Sean Owen Liang-Chi Hsieh

          Show
          foxish Anirudh Ramanathan added a comment - - edited FYI: sandflee's question was addressed in https://github.com/apache-spark-on-k8s/spark/issues/270 Our last published update is here: http://apache-spark-developers-list.1001551.n3.nabble.com/An-Update-on-Spark-on-Kubernetes-Jun-23-td21843.html In the next couple of weeks, we plan on submitting an official SPIP detailing our management of the fork, demonstrating uptake, and sustained maintainership. Our goal is to graduate beyond experimental support into an upstream contribution for Spark 2.3. Reynold Xin Sean Owen Liang-Chi Hsieh
          Hide
          sandflee sandflee added a comment -

          for a spark user, what benefits could we get? beside could co-run docker app and spark app

          Show
          sandflee sandflee added a comment - for a spark user, what benefits could we get? beside could co-run docker app and spark app
          Hide
          apachespark Apache Spark added a comment -

          User 'foxish' has created a pull request for this issue:
          https://github.com/apache/spark/pull/17522

          Show
          apachespark Apache Spark added a comment - User 'foxish' has created a pull request for this issue: https://github.com/apache/spark/pull/17522
          Hide
          aash Andrew Ash added a comment -

          As an update on this ticket:

          For those not already aware, work on native Spark integration with Kubernetes has been proceeding for the past several months in this repo https://github.com/apache-spark-on-k8s/spark in the branch-2.1-kubernetes branch, based off the 2.1.0 Apache release.

          We have an active core of about a half dozen contributors to the project with a wider group observing of about another dozen. Communication happens through the issues on the GitHub repo, a dedicated room in the Kubernetes Slack, and weekly video conferences hosted by the Kubernetes Big Data SIG.

          The full patch set is currently about 5500 lines, with about 500 of that as user/dev documentation. Infrastructure-wise, we have a cloud-hosted CI Jenkins instance set up donated by project members, which is running both unit tests and Kubernetes integration tests over the code.

          We recently entered a code freeze for our release branch and are preparing a first release to the wider community, which we plan to announce on the general Spark users list. It includes the completed "phase one" portion of the design doc shared a few months ago (https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#heading=h.fua3ml5mcolt), featuring cluster mode with static allocation of executors, submission of local resources, SSL throughout, and support for JVM languages (Java/Scala).

          After that release we'll be continuing to stabilize and improve the phase one feature set and move into a second phase of kubernetes work. It will likely be focused on support for dynamic allocation, though we haven't finalized planning for phase two yet. Working on the pluggable scheduler in SPARK-19700 may be included as well.

          Interested parties are of course welcome to watch the repo, join the weekly video conferences, give the code a shot, and contribute to the project!

          Show
          aash Andrew Ash added a comment - As an update on this ticket: For those not already aware, work on native Spark integration with Kubernetes has been proceeding for the past several months in this repo https://github.com/apache-spark-on-k8s/spark in the branch-2.1-kubernetes branch, based off the 2.1.0 Apache release. We have an active core of about a half dozen contributors to the project with a wider group observing of about another dozen. Communication happens through the issues on the GitHub repo, a dedicated room in the Kubernetes Slack, and weekly video conferences hosted by the Kubernetes Big Data SIG. The full patch set is currently about 5500 lines, with about 500 of that as user/dev documentation. Infrastructure-wise, we have a cloud-hosted CI Jenkins instance set up donated by project members, which is running both unit tests and Kubernetes integration tests over the code. We recently entered a code freeze for our release branch and are preparing a first release to the wider community, which we plan to announce on the general Spark users list. It includes the completed "phase one" portion of the design doc shared a few months ago ( https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#heading=h.fua3ml5mcolt ), featuring cluster mode with static allocation of executors, submission of local resources, SSL throughout, and support for JVM languages (Java/Scala). After that release we'll be continuing to stabilize and improve the phase one feature set and move into a second phase of kubernetes work. It will likely be focused on support for dynamic allocation, though we haven't finalized planning for phase two yet. Working on the pluggable scheduler in SPARK-19700 may be included as well. Interested parties are of course welcome to watch the repo, join the weekly video conferences, give the code a shot, and contribute to the project!
          Hide
          apachespark Apache Spark added a comment -

          User 'erikerlandson' has created a pull request for this issue:
          https://github.com/apache/spark/pull/16061

          Show
          apachespark Apache Spark added a comment - User 'erikerlandson' has created a pull request for this issue: https://github.com/apache/spark/pull/16061
          Hide
          mcheah Matt Cheah added a comment -

          Hamel Ajay Kothari I created SPARK-19700 to track the pluggable scheduler API design.

          Show
          mcheah Matt Cheah added a comment - Hamel Ajay Kothari I created SPARK-19700 to track the pluggable scheduler API design.
          Hide
          hkothari Hamel Ajay Kothari added a comment -

          Hi all, the pluggable scheduler sounds pretty valuable for my workflow as well. We've got our own scheduler where I work that we need to support in spark and currently that requires us to maintain an entire fork that we must keep up to date. If we could get a pluggable scheduler that would reduce our maintenance burden significantly.

          Has a JIRA been filed for this? I'd be happy to contribute some work on it.

          Show
          hkothari Hamel Ajay Kothari added a comment - Hi all, the pluggable scheduler sounds pretty valuable for my workflow as well. We've got our own scheduler where I work that we need to support in spark and currently that requires us to maintain an entire fork that we must keep up to date. If we could get a pluggable scheduler that would reduce our maintenance burden significantly. Has a JIRA been filed for this? I'd be happy to contribute some work on it.
          Hide
          mcheah Matt Cheah added a comment -

          I refactored the scheduler code as a thought experiment on what it would take to make the scheduler pluggable. The goal was to allow writers of a custom scheduler implementation not need to have any references to CoarseGrainedSchedulerBackend or TaskSchedulerImpl. A preliminary idea for the rewrite for the existing schedulers is here, and it's certainly a non-trivial change. The overarching philosophy of this prototype is to encourage dependency injection to wire together the scheduler components - see the implementations of ExternalClusterManagerFactory like YarnClusterManagerFactory - but there might be a better way to do this, and I haven't given enough thought to alternate designs. Some of the new public APIs introduced by this change were also defined arbitrarily but should be given more careful thought, such as the method signatures on ExecutorLifecycleHandler and expectations from ExternalClusterManager.validate(). Existing components still refer to CoarseGrainedSchedulerBackend and TaskSchedulerImpl but that's fine since the standalone, YARN, and Mesos scheduler internals should be able to use the non-public APIs. An implementation of the Kubernetes feature using this draft API is provided here, and the Kubernetes-specific components don't need to reference CoarseGrainedSchedulerBackend or TaskSchedulerImpl.

          The thought experiment shows that there would be a non-trivial amount of complexity that would be introduced if schedulers were to be made truly pluggable. The extra complexity and changes to the existing scheduler might de-stabilize the existing cluster manager support; for example, this prototype reorganizes much of the executor loss coordination logic as well but I haven't tested those changes thoroughly. The alternative to avoid this complexity would be to make CoarseGrainedSchedulerBackend and TaskSchedulerImpl part of the public API, but I'm extremely wary of going down this path because we would not just be exposing an interface, but rather we would be exposing a heavily opinionated implementation. Custom subclasses of CoarseGrainedSchedulerBackend would be expected to be resilient to changes in the implementations of these complex classes.

          Given the results from my experimentation and the drawbacks I've already highlighted in maintaining a separate fork, I still would favor integrating Kubernetes into the existing scheduler framework in-repo and marking it as an experimental feature for several releases, following the precedent of how the SQL and YARN experimental features were built and released in the past. Reynold Xin, where should we go from here?

          Show
          mcheah Matt Cheah added a comment - I refactored the scheduler code as a thought experiment on what it would take to make the scheduler pluggable. The goal was to allow writers of a custom scheduler implementation not need to have any references to CoarseGrainedSchedulerBackend or TaskSchedulerImpl . A preliminary idea for the rewrite for the existing schedulers is here , and it's certainly a non-trivial change. The overarching philosophy of this prototype is to encourage dependency injection to wire together the scheduler components - see the implementations of ExternalClusterManagerFactory like YarnClusterManagerFactory - but there might be a better way to do this, and I haven't given enough thought to alternate designs. Some of the new public APIs introduced by this change were also defined arbitrarily but should be given more careful thought, such as the method signatures on ExecutorLifecycleHandler and expectations from ExternalClusterManager.validate() . Existing components still refer to CoarseGrainedSchedulerBackend and TaskSchedulerImpl but that's fine since the standalone, YARN, and Mesos scheduler internals should be able to use the non-public APIs. An implementation of the Kubernetes feature using this draft API is provided here , and the Kubernetes-specific components don't need to reference CoarseGrainedSchedulerBackend or TaskSchedulerImpl . The thought experiment shows that there would be a non-trivial amount of complexity that would be introduced if schedulers were to be made truly pluggable. The extra complexity and changes to the existing scheduler might de-stabilize the existing cluster manager support; for example, this prototype reorganizes much of the executor loss coordination logic as well but I haven't tested those changes thoroughly. The alternative to avoid this complexity would be to make CoarseGrainedSchedulerBackend and TaskSchedulerImpl part of the public API, but I'm extremely wary of going down this path because we would not just be exposing an interface, but rather we would be exposing a heavily opinionated implementation. Custom subclasses of CoarseGrainedSchedulerBackend would be expected to be resilient to changes in the implementations of these complex classes. Given the results from my experimentation and the drawbacks I've already highlighted in maintaining a separate fork, I still would favor integrating Kubernetes into the existing scheduler framework in-repo and marking it as an experimental feature for several releases, following the precedent of how the SQL and YARN experimental features were built and released in the past. Reynold Xin , where should we go from here?
          Hide
          aash Andrew Ash added a comment -

          There are definitely challenges in building features that take longer than a release cycle (quarterly for Spark).

          We could maintain a long-running feature branch for spark-k8s that lasts several months and then gets merged into Spark in a big-bang merge, with that feature branch living either on apache/spark or in some other community-accessible repo. I don't think there are many practical differences between in apache/spark vs a different repo for where the source is hosted if both are not in Apache releases.

          Or we could merge many smaller commits for spark-k8s into the apache/spark master branch along the way and release as an experimental feature when release time comes. This enables more continuous code review but has the risk of destabilizing the master branch if code reviews miss things.

          Looking to past instances of large features spanning multiple release cycles (like SparkSQL and YARN integration), both of those had work happening primarily in-repo from what I can tell, and releases included large disclaimers in release notes for those experimental features. That precedent seems to suggest Kubernetes integration should follow a similar path.

          Personally I lean towards the approach of more smaller commits into master rather than a long-running feature branch. By code reviewing PRs into the main repo as we go the feature will be easier to code review and will also get wider feedback as an experimental feature than a side branch or side repo would get. This also serves to include Apache committers from the start in understanding the codebase, rather than foisting a foreign codebase onto the project and hope committers grok it well enough to hold the line on high quality code reviews. Looking to the future where Kubernetes integration is potentially included in the mainline apache release (like Mesos and YARN), it's best to work as contributor + committer together from the start for shared understanding.

          Making an API for third party cluster managers sound great and the easy, clean choice from a software engineering point of view, but I wonder how much value the practical benefits of having a pluggable cluster manager actually gets the Apache project. It seems like both Two Sigma and IBM have been able to maintain their proprietary schedulers without the benefits of the API we're considering building. Who / what workflows are we aiming to support with an API?

          Show
          aash Andrew Ash added a comment - There are definitely challenges in building features that take longer than a release cycle (quarterly for Spark). We could maintain a long-running feature branch for spark-k8s that lasts several months and then gets merged into Spark in a big-bang merge, with that feature branch living either on apache/spark or in some other community-accessible repo. I don't think there are many practical differences between in apache/spark vs a different repo for where the source is hosted if both are not in Apache releases. Or we could merge many smaller commits for spark-k8s into the apache/spark master branch along the way and release as an experimental feature when release time comes. This enables more continuous code review but has the risk of destabilizing the master branch if code reviews miss things. Looking to past instances of large features spanning multiple release cycles (like SparkSQL and YARN integration), both of those had work happening primarily in-repo from what I can tell, and releases included large disclaimers in release notes for those experimental features. That precedent seems to suggest Kubernetes integration should follow a similar path. Personally I lean towards the approach of more smaller commits into master rather than a long-running feature branch. By code reviewing PRs into the main repo as we go the feature will be easier to code review and will also get wider feedback as an experimental feature than a side branch or side repo would get. This also serves to include Apache committers from the start in understanding the codebase, rather than foisting a foreign codebase onto the project and hope committers grok it well enough to hold the line on high quality code reviews. Looking to the future where Kubernetes integration is potentially included in the mainline apache release (like Mesos and YARN), it's best to work as contributor + committer together from the start for shared understanding. Making an API for third party cluster managers sound great and the easy, clean choice from a software engineering point of view, but I wonder how much value the practical benefits of having a pluggable cluster manager actually gets the Apache project. It seems like both Two Sigma and IBM have been able to maintain their proprietary schedulers without the benefits of the API we're considering building. Who / what workflows are we aiming to support with an API?
          Hide
          lins05 Shuai Lin added a comment -

          If I had to choose between maintaining a fork versus cleaning up the scheduler to make a public API, I would choose the latter in the interest of clarifying the relationship between the K8s effort and the mainline project, as well as for making the scheduler code cleaner in general.

          Adding support for pluggable scheduler backend in spark is cool. AFAIK there are some custom scheduler backends for spark, and they are using forked versions of spark due to the lack of pluggable scheduler backend support:

          we could include the K8s scheduler in the Apache releases as an experimental feature, ignore its bugs and test failures for the next few releases (that is, problems in the K8s-related code should never block releases)

          I'm afraid that doesn't sound like good practice

          Show
          lins05 Shuai Lin added a comment - If I had to choose between maintaining a fork versus cleaning up the scheduler to make a public API, I would choose the latter in the interest of clarifying the relationship between the K8s effort and the mainline project, as well as for making the scheduler code cleaner in general. Adding support for pluggable scheduler backend in spark is cool. AFAIK there are some custom scheduler backends for spark, and they are using forked versions of spark due to the lack of pluggable scheduler backend support: Two sigma's spark fork , which added scheduler support for their Cook Scheduler IBM also has a custom "Spark Session Scheduler", which they shared in last month's MesosCon Asia we could include the K8s scheduler in the Apache releases as an experimental feature, ignore its bugs and test failures for the next few releases (that is, problems in the K8s-related code should never block releases) I'm afraid that doesn't sound like good practice
          Hide
          mcheah Matt Cheah added a comment -

          Reynold Xin - thanks for thinking about this!

          The concerns around testing and support burden are certainly valid. The alternatives also come with their own sets of concerns though.

          If we publish the scheduler as a library:

          The current code in the schedulers is not marked as public API. We would need to refactor the scheduler code to make an API the Apache project would support for third party use (like the K8s integration). There are (at least) two places this needs to be done:

          • CoarseGrainedSchedulerBackend would need to become extendable, since all of the schedulers (standalone, Mesos, yarn-client, and yarn-cluster) currently extend this fairly complex class. The CoarseGrainedSchedulerBackend code invokes its pluggable methods (doRequestTotalExecutors and doKillExecutors) with particular expectations, and hence these expectations would also have to remain stable as long as it is a public API.
          • SparkSubmit would need to support 3rd party cluster managers. Currently SparkSubmit's code includes special case handling for the bundled cluster managers, for example in YARN mode spark-submit accepts --queue to specify the queue to run the job with. Thus we would need to make the spark-submit argument handling pluggable as well for other cluster managers parameters.

          Off the top of my head, I could think of numerous ways we could expose both of these as plugins, but it's not immediately obvious what the best option is.

          If we fork the project:

          Maintaining a fork places burden on the fork maintainers to keep the fork up to date with the mainline releases. It also makes it unclear what the relationship between this feature and its associated fork is with the direction of the Spark project as a whole, and what the timeline is for eventual re-integration of the fork. Is there a prior example of this approach working in practice in the Spark community?

          In either case (library or forking), there's also the question of how we encourage alpha testing and early usage of this feature. If the code is not on the mainline branch, there needs to be alternative channels outside of the Spark releases themselves to announce that this feature is available and that we would like feedback on it. It would also be ideal for the code reviews to be visible early on, so that everyone that watches the Spark repository can catch the updates and progress of this feature.

          Having said all of this, I think these issues can be navigated. If I had to choose between maintaining a fork versus cleaning up the scheduler to make a public API, I would choose the latter in the interest of clarifying the relationship between the K8s effort and the mainline project, as well as for making the scheduler code cleaner in general. However it's not immediately clear if the effort required to make these refactors is worthwhile when we could include the K8s scheduler in the Apache releases as an experimental feature, ignore its bugs and test failures for the next few releases (that is, problems in the K8s-related code should never block releases), and ship this as we currently do with YARN and Mesos.

          I'd like to hear everyone's thoughts regarding the tradeoffs we are making between these different approaches of pushing this feature forward.

          Show
          mcheah Matt Cheah added a comment - Reynold Xin - thanks for thinking about this! The concerns around testing and support burden are certainly valid. The alternatives also come with their own sets of concerns though. If we publish the scheduler as a library: The current code in the schedulers is not marked as public API. We would need to refactor the scheduler code to make an API the Apache project would support for third party use (like the K8s integration). There are (at least) two places this needs to be done: CoarseGrainedSchedulerBackend would need to become extendable, since all of the schedulers (standalone, Mesos, yarn-client, and yarn-cluster) currently extend this fairly complex class. The CoarseGrainedSchedulerBackend code invokes its pluggable methods (doRequestTotalExecutors and doKillExecutors) with particular expectations, and hence these expectations would also have to remain stable as long as it is a public API. SparkSubmit would need to support 3rd party cluster managers. Currently SparkSubmit's code includes special case handling for the bundled cluster managers, for example in YARN mode spark-submit accepts --queue to specify the queue to run the job with. Thus we would need to make the spark-submit argument handling pluggable as well for other cluster managers parameters. Off the top of my head, I could think of numerous ways we could expose both of these as plugins, but it's not immediately obvious what the best option is. If we fork the project: Maintaining a fork places burden on the fork maintainers to keep the fork up to date with the mainline releases. It also makes it unclear what the relationship between this feature and its associated fork is with the direction of the Spark project as a whole, and what the timeline is for eventual re-integration of the fork. Is there a prior example of this approach working in practice in the Spark community? In either case (library or forking), there's also the question of how we encourage alpha testing and early usage of this feature. If the code is not on the mainline branch, there needs to be alternative channels outside of the Spark releases themselves to announce that this feature is available and that we would like feedback on it. It would also be ideal for the code reviews to be visible early on, so that everyone that watches the Spark repository can catch the updates and progress of this feature. Having said all of this, I think these issues can be navigated. If I had to choose between maintaining a fork versus cleaning up the scheduler to make a public API, I would choose the latter in the interest of clarifying the relationship between the K8s effort and the mainline project, as well as for making the scheduler code cleaner in general. However it's not immediately clear if the effort required to make these refactors is worthwhile when we could include the K8s scheduler in the Apache releases as an experimental feature, ignore its bugs and test failures for the next few releases (that is, problems in the K8s-related code should never block releases), and ship this as we currently do with YARN and Mesos. I'd like to hear everyone's thoughts regarding the tradeoffs we are making between these different approaches of pushing this feature forward.
          Hide
          rxin Reynold Xin added a comment -

          In the past few days I've given this a lot of thought.

          I'm personally very interested in this work, and would actually use it myself. That said, based on my experience, the real work starts after the initial thing works, i.e. the maintenance and enhancement work in the future will be much larger than the initial commit. Adding another officially supported scheduler definitely has some serious (and maybe disruptive) impacts to Spark. Some examples are ...

          1. Testing becomes more complicated.
          2. Related to 1, releases become more likely to be delayed. In the past many Spark releases were delayed due to bugs in Mesos integration or the YARN integration, because those are harder to be tested reliably in an automated fashion.
          3. The release process has to change.

          Given Kubernetes is still very young, and unclear how successful it will be in the future (I personally think it will be, but you never know), I would make the following, concrete recommendations on moving this forward:

          1. See if we can implement this as an add-on (library) outside Spark If not possible, what about a fork?
          2. Publish some non-official docker images so it is easy to use Spark on Kubernetes this way.
          3. Encourage users to use it and get feedback. Have the contributors that are really interested in this work maintain it for couple Spark releases (this includes testing the implementation, publishing new docker images, writing documentations).
          4. Evaluate later (say 2 releases) how well this has been received on whether we take a coordinated effort to merge this into Spark, since it might become the most popular cluster manager.

          Show
          rxin Reynold Xin added a comment - In the past few days I've given this a lot of thought. I'm personally very interested in this work, and would actually use it myself. That said, based on my experience, the real work starts after the initial thing works, i.e. the maintenance and enhancement work in the future will be much larger than the initial commit. Adding another officially supported scheduler definitely has some serious (and maybe disruptive) impacts to Spark. Some examples are ... 1. Testing becomes more complicated. 2. Related to 1, releases become more likely to be delayed. In the past many Spark releases were delayed due to bugs in Mesos integration or the YARN integration, because those are harder to be tested reliably in an automated fashion. 3. The release process has to change. Given Kubernetes is still very young, and unclear how successful it will be in the future (I personally think it will be, but you never know), I would make the following, concrete recommendations on moving this forward: 1. See if we can implement this as an add-on (library) outside Spark If not possible, what about a fork? 2. Publish some non-official docker images so it is easy to use Spark on Kubernetes this way. 3. Encourage users to use it and get feedback. Have the contributors that are really interested in this work maintain it for couple Spark releases (this includes testing the implementation, publishing new docker images, writing documentations). 4. Evaluate later (say 2 releases) how well this has been received on whether we take a coordinated effort to merge this into Spark, since it might become the most popular cluster manager.
          Hide
          eje Erik Erlandson added a comment -

          As I understand it (and as I've built them) an "MVP" Apache Spark docker image consists of:

          1. Some base OS image, presumably some variant of linux, with some package management, but starting from some minimalist install
          2. A spark-compatible JRE
          3. (if python support) whatever standard python installs are required to run py-spark
          4. A Spark distro, likely installed from an official distro tarball

          Hopefully I'm not over-simplifying, but IIUC the licensing around all of those is well understood and known to be FOSS compatible. Other non-minimal, or non-standard images builds are definitely possible, but I'd consider those to be under the purview of 3rd parties in the community.

          Publishing "official Apache Spark" images would imply some new responsibilities, including maintenance. A possible roadmap might be to add "official" images as part of a subsequent phase, drawing on experience with phase 1. A separate registry organization could in principle be used, for example: https://hub.docker.com/u/k8s4spark/

          A consequence of not having such an official image is that integration testing would then be based, at least initially, on 3rd-party images.

          Show
          eje Erik Erlandson added a comment - As I understand it (and as I've built them) an "MVP" Apache Spark docker image consists of: 1. Some base OS image, presumably some variant of linux, with some package management, but starting from some minimalist install 2. A spark-compatible JRE 3. (if python support) whatever standard python installs are required to run py-spark 4. A Spark distro, likely installed from an official distro tarball Hopefully I'm not over-simplifying, but IIUC the licensing around all of those is well understood and known to be FOSS compatible. Other non-minimal, or non-standard images builds are definitely possible, but I'd consider those to be under the purview of 3rd parties in the community. Publishing "official Apache Spark" images would imply some new responsibilities, including maintenance. A possible roadmap might be to add "official" images as part of a subsequent phase, drawing on experience with phase 1. A separate registry organization could in principle be used, for example: https://hub.docker.com/u/k8s4spark/ A consequence of not having such an official image is that integration testing would then be based, at least initially, on 3rd-party images.
          Hide
          srowen Sean Owen added a comment -

          Generally, ASF projects produce source not binaries, but, certainly also publish binary artifacts as a convenience. A docker image falls into that category. The drawback – and it's really not trivial – is that it means we are distributing a bunch more third party software and now have to police the license of everything in that image. That alone may be a deal breaker depending on what exact bits must be distributed.

          Aside from that the question is maintenance burden vs convenience for users. I think ASF projects don't generally distribute 'packagings' but do distribute binary forms of source artifacts. For example, Spark doesn't publish RPMs or VM images.

          Show
          srowen Sean Owen added a comment - Generally, ASF projects produce source not binaries, but, certainly also publish binary artifacts as a convenience. A docker image falls into that category. The drawback – and it's really not trivial – is that it means we are distributing a bunch more third party software and now have to police the license of everything in that image. That alone may be a deal breaker depending on what exact bits must be distributed. Aside from that the question is maintenance burden vs convenience for users. I think ASF projects don't generally distribute 'packagings' but do distribute binary forms of source artifacts. For example, Spark doesn't publish RPMs or VM images.
          Hide
          aash Andrew Ash added a comment -

          Reynold Xin is it a problem for ASF projects to publish docker images? Have other Apache projects done this before, or would Spark be the first one?

          Show
          aash Andrew Ash added a comment - Reynold Xin is it a problem for ASF projects to publish docker images? Have other Apache projects done this before, or would Spark be the first one?
          Hide
          foxish Anirudh Ramanathan added a comment -

          There is a way to use a standard image that already exists (say ubuntu) and download the distribution and dependencies onto it prior to running drivers and executors. I explored this initially but even if this were allowed for, it's not likely to be used much.

          From talking to people looking to use Spark on Kubernetes, it appears that they'd prefer either an official image or build their own image containing the distribution and application-jars.

          Show
          foxish Anirudh Ramanathan added a comment - There is a way to use a standard image that already exists (say ubuntu) and download the distribution and dependencies onto it prior to running drivers and executors. I explored this initially but even if this were allowed for, it's not likely to be used much. From talking to people looking to use Spark on Kubernetes, it appears that they'd prefer either an official image or build their own image containing the distribution and application-jars.
          Hide
          eje Erik Erlandson added a comment -

          Not publishing images puts users in the position of not being able to run this out-of-the-box. First they would have to either build images themselves, or find somebody else's 3rd-party images, etc. It doesn't seem like it would make for good UX.

          Show
          eje Erik Erlandson added a comment - Not publishing images puts users in the position of not being able to run this out-of-the-box. First they would have to either build images themselves, or find somebody else's 3rd-party images, etc. It doesn't seem like it would make for good UX.
          Hide
          eje Erik Erlandson added a comment -

          A possible scheme might be to publish the docker-files, but not actually build the images. It seems more standard to actually publish images for the community. Is there some reason for not wanting to do that?

          Show
          eje Erik Erlandson added a comment - A possible scheme might be to publish the docker-files, but not actually build the images. It seems more standard to actually publish images for the community. Is there some reason for not wanting to do that?
          Hide
          rxin Reynold Xin added a comment -

          Is there a way to get this working without the project having to publish docker images?

          Show
          rxin Reynold Xin added a comment - Is there a way to get this working without the project having to publish docker images?
          Hide
          mcheah Matt Cheah added a comment -

          There has also been some discussion about this on the document that has been the working draft for the attached proposal: https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit?usp=sharing

          Show
          mcheah Matt Cheah added a comment - There has also been some discussion about this on the document that has been the working draft for the attached proposal: https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit?usp=sharing
          Hide
          mcheah Matt Cheah added a comment -

          I attached a proposal outlining a potential long term plan for this feature. Any feedback about it would be appreciated.

          Show
          mcheah Matt Cheah added a comment - I attached a proposal outlining a potential long term plan for this feature. Any feedback about it would be appreciated.
          Hide
          apachespark Apache Spark added a comment -

          User 'erikerlandson' has created a pull request for this issue:
          https://github.com/apache/spark/pull/16061

          Show
          apachespark Apache Spark added a comment - User 'erikerlandson' has created a pull request for this issue: https://github.com/apache/spark/pull/16061
          Hide
          eje Erik Erlandson added a comment -

          Another comment on external plug-ins for scheduling: although I think it's a good idea to support it, it does introduce the maintenance of keeping external scheduling packages synced with the main Apache Spark project. That represents another argument for first-class support for schedulers of sufficient importance to the community.

          Show
          eje Erik Erlandson added a comment - Another comment on external plug-ins for scheduling: although I think it's a good idea to support it, it does introduce the maintenance of keeping external scheduling packages synced with the main Apache Spark project. That represents another argument for first-class support for schedulers of sufficient importance to the community.
          Hide
          eje Erik Erlandson added a comment -

          I agree with William Benton that Kube is a sufficiently-popular container mgmt system that it warrants "first-class" sub-project status for Apache park.

          I'm also interested in making modifications to the Spark scheduler support so that it is easier to plug-in new ones externally. I believe the necessary modifications would not be very intrusive. The system is already based on sub-classing abstract traits. It would be mostly a matter of increasing their exposure.

          Show
          eje Erik Erlandson added a comment - I agree with William Benton that Kube is a sufficiently-popular container mgmt system that it warrants "first-class" sub-project status for Apache park. I'm also interested in making modifications to the Spark scheduler support so that it is easier to plug-in new ones externally. I believe the necessary modifications would not be very intrusive. The system is already based on sub-classing abstract traits. It would be mostly a matter of increasing their exposure.
          Hide
          willbenton William Benton added a comment - - edited

          Sean Owen Currently ExternalClusterManager is Spark-private, so there isn't a great way to implement a new scheduler backend outside of Spark proper. I think it would be great if an extension API for new cluster managers were public and developers could work with it with some expectation of stability! But even if this API were exposed and (relatively) stable, I think there's a good argument that if any cluster managers besides standalone are to live in Spark proper that a Kubernetes scheduler should be there, too.

          (Why draw the line just past Mesos and YARN when Kubernetes also enjoys a large community and many deployments? And if the Spark community is to draw the line somewhere, why not draw it around the standalone scheduler? If we're using the messaging connectors in Bahir as an example, then historical precedent isn't an argument for keeping existing schedulers in Spark proper.)

          Show
          willbenton William Benton added a comment - - edited Sean Owen Currently ExternalClusterManager is Spark-private, so there isn't a great way to implement a new scheduler backend outside of Spark proper. I think it would be great if an extension API for new cluster managers were public and developers could work with it with some expectation of stability! But even if this API were exposed and (relatively) stable, I think there's a good argument that if any cluster managers besides standalone are to live in Spark proper that a Kubernetes scheduler should be there, too. (Why draw the line just past Mesos and YARN when Kubernetes also enjoys a large community and many deployments? And if the Spark community is to draw the line somewhere, why not draw it around the standalone scheduler? If we're using the messaging connectors in Bahir as an example, then historical precedent isn't an argument for keeping existing schedulers in Spark proper.)
          Hide
          srowen Sean Owen added a comment -

          I think ideally this should live outside Spark. Several connector-like projects have moved to Bahir. It's relatively harder to maintain it in the Spark project.

          Show
          srowen Sean Owen added a comment - I think ideally this should live outside Spark. Several connector-like projects have moved to Bahir. It's relatively harder to maintain it in the Spark project.
          Hide
          foxish Anirudh Ramanathan added a comment -

          Corresponding issue in kubernetes:
          https://github.com/kubernetes/kubernetes/issues/34377

          Show
          foxish Anirudh Ramanathan added a comment - Corresponding issue in kubernetes: https://github.com/kubernetes/kubernetes/issues/34377
          Show
          eje Erik Erlandson added a comment - Current prototype: https://github.com/foxish/spark/tree/k8s-support https://github.com/foxish/spark/pull/1

            People

            • Assignee:
              Unassigned
              Reporter:
              eje Erik Erlandson
              Shepherd:
              Matei Zaharia
            • Votes:
              34 Vote for this issue
              Watchers:
              95 Start watching this issue

              Dates

              • Created:
                Updated:

                Development