Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45923

Spark Kubernetes Operator

    XMLWordPrintableJSON

Details

    Description

      We would like to develop a Java-based Kubernetes operator for Apache Spark. Following the operator pattern (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark users may manage applications and related components seamlessly using native tools like kubectl. The primary goal is to simplify the Spark user experience on Kubernetes, minimizing the learning curve and operational complexities and therefore enable users to focus on the Spark application development.

      Ideally, it would reside in a separate repository (like Spark docker or Spark connect golang) and be loosely connected to the Spark release cycle while supporting multiple Spark versions.

      SPIP doc: https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE

      Dev email discussion : https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz

      Attachments

        Issue Links

          1.
          Add License to Spark Operator Sub-task Resolved Zhou JIANG
          2.
          Setup gradle as build tool for operator repository Sub-task Resolved Zhou JIANG
          3.
          Setup Static Analysis for Operator Sub-task Resolved Zhou JIANG
          4.
          Add Operator CI Task for Java Build and Test Sub-task Resolved Zhou JIANG
          5.
          Add Java API Module for Spark Operator Sub-task Resolved Zhou JIANG
          6.
          Use API Group `spark.apache.org` Sub-task Resolved Dongjoon Hyun
          7.
          Update `build.gradle` to fix deprecation warnings Sub-task Resolved Dongjoon Hyun
          8.
          Promote `KubernetesVolumeUtils` to `DeveloperApi` Sub-task Resolved Dongjoon Hyun
          9.
          Promote `KubernetesClientUtils` to `DeveloperApi` Sub-task Resolved Dongjoon Hyun
          10.
          Promote `o.a.s.d.k8s.Constants` to `DeveloperApi` Sub-task Resolved Dongjoon Hyun
          11.
          Promote `*MainAppResource` and `NonJVMResource` to `DeveloperApi` Sub-task Resolved Dongjoon Hyun
          12.
          Promote ` KubernetesDriverBuilder` to `DeveloperApi` Sub-task Resolved Zhou JIANG
          13.
          Promote ` KubernetesDriverSpec` to `DeveloperApi` Sub-task Resolved Zhou JIANG
          14.
          Promote ` KubernetesDriverConf` to `DeveloperApi` Sub-task Resolved Zhou JIANG
          15.
          Promote `PrometheusServlet` to `DeveloperApi` Sub-task Resolved Zhou JIANG
          16.
          Add Spark application submission worker for operator Sub-task Resolved Zhou JIANG
          17.
          Enable autolink to SPARK jira issue Sub-task Resolved Dongjoon Hyun
          18.
          Use the official Apache Spark 4.0.0-preview1 Sub-task Resolved Dongjoon Hyun
          19.
          Upgrade checkstyle and spotbugs version Sub-task Resolved Zhou JIANG
          20.
          Define Config Loading Framework for Spark Operator Controller Sub-task Resolved Zhou JIANG
          21.
          Use `BasePluginExtension` in `spark-operator/build.gradle` Sub-task Resolved Dongjoon Hyun
          22.
          Upgrade Gradle to 8.9 Sub-task Resolved Dongjoon Hyun
          23.
          Upgrade Gradle to 8.10 Sub-task Resolved Dongjoon Hyun
          24.
          Avoid unnecessary task configuration in `spark-operator-api` Sub-task Resolved Dongjoon Hyun
          25.
          Fix `spark-operator` module to define test framework explicitly Sub-task Resolved Dongjoon Hyun
          26.
          Add `reconciler` to `spark-operator` module Sub-task Resolved Zhou JIANG
          27.
          Ban `org.apache.commons.collections` in favor of `collections4` Sub-task Resolved Dongjoon Hyun
          28.
          Add Controller Metrics System and Utils Sub-task Resolved Zhou JIANG
          29.
          Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` Sub-task Resolved Dongjoon Hyun
          30.
          Avoid `raw` type usage Sub-task Resolved Dongjoon Hyun
          31.
          Exclude `JUnitAssertionsShouldIncludeMessage/JUnitTestContainsTooManyAsserts` PMD rules and simplify test code Sub-task Resolved Dongjoon Hyun
          32.
          Add docker image build for operator Sub-task Resolved Zhou JIANG
          33.
          Add SparkOperator class and tests Sub-task Resolved Zhou JIANG
          34.
          Fix `ENTRYPOINT` to point `/opt/spark-operator/operator/docker-entrypoint.sh` Sub-task Resolved Dongjoon Hyun
          35.
          Verify built images in `build-image` CI job via `docker run` test Sub-task Resolved Dongjoon Hyun
          36.
          Minimize docker image by removing redundant `chown` commands Sub-task Resolved Dongjoon Hyun
          37.
          Remove `commons-lang3` dependency from `spark-operator-api` Sub-task Resolved Dongjoon Hyun
          38.
          Reduce `spark-operator` fat jar size by excluding dependencies Sub-task Resolved Dongjoon Hyun
          39.
          Use the latest `setup-java` v4 with `cache` feature Sub-task Resolved Dongjoon Hyun
          40.
          Use the latest PMD 6.x rules instead of the deprecated ones Sub-task Resolved Dongjoon Hyun
          41.
          Increase `Gradle` JVM memory to `4g` like Spark repo Sub-task Resolved Dongjoon Hyun
          42.
          Enforce ImmutableField and UselessParentheses rules Sub-task Resolved William Hyun
          43.
          Enforce SignatureDeclareThrowsException and AvoidThrowingRawExceptionTypes rules Sub-task Resolved William Hyun
          44.
          Enforce ConfusingTernary and PrematureDeclaration rules Sub-task Resolved William Hyun
          45.
          Enforce `FieldDeclarationsShouldBeAtStartOfClass`, `LinguisticNaming` and `ClassWithOnlyPrivateConstructorsShouldBeFinal` rules Sub-task Resolved William Hyun
          46.
          Fix RestartPolicyTest to cover `SchedulingFailure` Sub-task Resolved Dongjoon Hyun
          47.
          Enforce UseUtilityClass rule Sub-task Resolved William Hyun
          48.
          Add `OpenContainers` Annotations to docker image Sub-task Resolved Dongjoon Hyun
          49.
          Update `README.md` with `build/test/CI` info Sub-task Resolved Dongjoon Hyun
          50.
          Speed up docker image building by excluding `check` instead of `test` Sub-task Resolved Dongjoon Hyun
          51.
          Add `k8s-integration-tests` GitHub Action CI job Sub-task Resolved Dongjoon Hyun
          52.
          Add Helm Chart Sub-task Resolved Zhou JIANG
          53.
          Install and test Helm chart in K8s integration test CI Sub-task Resolved Dongjoon Hyun
          54.
          Upgrade `kubernetes-client` to 6.13.3 and `commons-lang3` to 3.16.0 Sub-task Resolved Dongjoon Hyun
          55.
          Add `pi.yaml` example and update README.md Sub-task Resolved Dongjoon Hyun
          56.
          Add `pi-with-one-pod.yaml` example Sub-task Resolved Dongjoon Hyun
          57.
          Add `pi-on-yunikorn.yaml` example Sub-task Resolved Dongjoon Hyun
          58.
          Add `sql.yaml` example Sub-task Resolved Dongjoon Hyun
          59.
          Improve `gradlew` to support both `curl` and `wget` Sub-task Resolved Dongjoon Hyun
          60.
          Refactor `RestartPolicyTest` to test per case Sub-task Resolved Dongjoon Hyun
          61.
          Revise `reconcilesteps` package and `SparkAppReconciler` Sub-task Resolved Dongjoon Hyun
          62.
          Revise `observers` package Sub-task Resolved Dongjoon Hyun
          63.
          Fix `docker-entrypoint.sh` to quote the environment variables Sub-task Resolved Dongjoon Hyun
          64.
          Fix `javadoc` generation and add `lint` test pipeline to prevent Sub-task Resolved Dongjoon Hyun
          65.
          Add `_MESSAGE` postfix to `DRIVER_(READY|RUNNING)` Sub-task Resolved Dongjoon Hyun
          66.
          Add `NOTICE`, `NOTICE-binary`, `LICENSE-binary` files and update `*.gradle` files Sub-task Resolved Dongjoon Hyun
          67.
          Add `deploy.gradle` to support publish-related tasks Sub-task Resolved Dongjoon Hyun
          68.
          Add `buildDockerImage` Gradle Task Sub-task Resolved Dongjoon Hyun
          69.
          Add `-SNAPSHOT` postfix to `Spark Operator` version Sub-task Resolved Dongjoon Hyun
          70.
          Revise `Spark Operator` docker image Sub-task Resolved Dongjoon Hyun
          71.
          Use `HTTP_*` constant variables instead of magic numbers Sub-task Resolved Dongjoon Hyun
          72.
          Generalize `relocateGeneratedCRD` Gradle Task to handle `*.spark.apache.org-v1.yml` Sub-task Resolved Dongjoon Hyun
          73.
          Generalize `printer-columns.sh` to handle `*.spark.apache.org-v1.yml` files Sub-task Resolved Dongjoon Hyun
          74.
          Add `SparkCluster` to `spark-operator-api` module and examples Sub-task Resolved Dongjoon Hyun
          75.
          Add `SparkCluster` to `spark-submission-worker` module Sub-task Resolved Dongjoon Hyun
          76.
          Add `SparkCluster` to `spark-operator` module Sub-task Resolved Dongjoon Hyun
          77.
          Propagate Spark configurations to SparkCluster Sub-task Resolved Dongjoon Hyun
          78.
          Use `setup-minikube` GitHub Action Sub-task Resolved Qi Tan
          79.
          Add K8s service for Workers to SparkClusterResourceSpec Sub-task Resolved Dongjoon Hyun
          80.
          Revise `InstanceConfig` to `ExecutorInstanceConfig` class Sub-task Resolved Dongjoon Hyun
          81.
          Add e2e test in operator workflow Sub-task Resolved Qi Tan
          82.
          Document `SparkCluster` and add `submit-pi-to-prod.sh` example Sub-task Resolved Dongjoon Hyun
          83.
          Document nightly versions of operator image and Helm Chart Sub-task Resolved Dongjoon Hyun
          84.
          Add `publish_snapshot_dockerhub.yml` Daily GitHub Action job Sub-task Resolved Dongjoon Hyun
          85.
          Support `schedulerName` for SparkCluster Sub-task Resolved Dongjoon Hyun
          86.
          Add `publish_snapshot_chart.yml` GitHub Action job Sub-task Resolved Dongjoon Hyun
          87.
          Use `rsync` to upload to `nightlies` Sub-task Resolved Dongjoon Hyun
          88.
          Fix `Dockerfile` by removing unused `ARG` from builder and moving default value Sub-task Resolved Dongjoon Hyun
          89.
          Simplify snapshot HelmChart to use `apache/spark-kubernetes-operator:main-snapshot` by default Sub-task Resolved Dongjoon Hyun
          90.
          Introduce ClusterToleration and WorkerInstanceConfig Sub-task Resolved Zhou JIANG
          91.
          E2E Tests Catch Step SparkApplication Kind Not found Sub-task Resolved Qi Tan
          92.
          Enable Pull Request Labeler Sub-task Resolved Dongjoon Hyun
          93.
          Support user provided spec for SparkCluster Sub-task Resolved Zhou JIANG
          94.
          Add `cluster-with-template.yaml` Sub-task Resolved Dongjoon Hyun
          95.
          E2E Test Components: State Transition Sub-task Resolved Qi Tan
          96.
          Remove `SPARK_NO_DAEMONIZE` in favor of live log UIs Sub-task Resolved Dongjoon Hyun
          97.
          Support `master|worker` container templates Sub-task Resolved Dongjoon Hyun
          98.
          Add `8081` port to Worker resource spec Sub-task Resolved Dongjoon Hyun
          99.
          Add `Clean Up` section to `README.md` Sub-task Resolved Dongjoon Hyun
          100.
          Use Gradle Version Catalog Sub-task Resolved Dongjoon Hyun
          101.
          Add `log4j2` default setting to `values.yaml` Sub-task Resolved Dongjoon Hyun
          102.
          Add `pi-scala.yaml` and `pyspark-pi.yaml` Sub-task Resolved Zhou JIANG
          103.
          Adjust `ERROR`-level log messages Sub-task Resolved Dongjoon Hyun
          104.
          Hot Properties Reload E2E Test Sub-task Resolved Qi Tan
          105.
          Fix K8s version in GitHub Action CI Sub-task Resolved Dongjoon Hyun
          106.
          Upgrade the minimum K8s version to v1.28 Sub-task Resolved Dongjoon Hyun
          107.
          Generate Spark Operator Config Property Doc Sub-task Resolved Zhou JIANG
          108.
          E2E test template includes invalid spec field Sub-task Resolved Zhou JIANG
          109.
          Upgrade Gradle to 8.10.1 Sub-task Resolved Dongjoon Hyun
          110.
          Refactor prefix `appResources` to `workloadResources` Sub-task Resolved Zhou JIANG
          111.
          Spark Cluster Happy Path State Transition Test Sub-task Resolved Qi Tan
          112.
          Update `e2e/python/chainsaw-test.yaml` to use non-R image Sub-task Resolved Dongjoon Hyun
          113.
          Two instances of Spark Operator running at the same time Sub-task Resolved Qi Tan
          114.
          E2E Workflow Refactor Sub-task Resolved Qi Tan
          115.
          Use `apache/spark` images instead of `spark` Sub-task Resolved Dongjoon Hyun
          116.
          Use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar` Sub-task Resolved Dongjoon Hyun
          117.
          Add `Java 21`-based `SparkPi` example Sub-task Resolved Dongjoon Hyun
          118.
          Add `Java 21`-based `SparkCluster` example Sub-task Resolved Dongjoon Hyun
          119.
          Add Java 21 in the e2e test Sub-task Resolved Qi Tan
          120.
          Upgrade Spark to `4.0.0-preview2` Sub-task Resolved Dongjoon Hyun
          121.
          Upgrade `README`, examples, tests to use `preview2` Sub-task Resolved Dongjoon Hyun
          122.
          Support HPA for `SparkCluster` Sub-task Resolved Dongjoon Hyun
          123.
          Support `HPA` template for `SparkCluster` Sub-task Resolved Dongjoon Hyun
          124.
          Remove (master|worker) prefix from field names of `(Master|Worker)Spec` Sub-task Resolved Dongjoon Hyun
          125.
          Update E2E tests to use Spark 3.5.3 Sub-task Resolved Dongjoon Hyun
          126.
          Fix the error when enable the sparkApplicationSentinel Sub-task Resolved Qi Tan
          127.
          Provide empty `RuntimeVersions` object to `ClusterSpec.runtimeVersions` by default Sub-task Resolved Dongjoon Hyun
          128.
          Make `o.a.s.k8s.operator.utils.Utils` argument naming consistent Sub-task Resolved Dongjoon Hyun
          129.
          Add `spark-version` label to `Spark Cluster` resources Sub-task Resolved Dongjoon Hyun
          130.
          Add docs for operator Sub-task Resolved Zhou JIANG
          131.
          Update `cluster-with-template.yaml` example with pod annotation Sub-task Resolved Dongjoon Hyun
          132.
          Use `registry.k8s.io/pause:3.9` to avoid failure deterministically Sub-task Resolved Dongjoon Hyun
          133.
          Upgrade Gradle to 8.11 Sub-task Resolved Dongjoon Hyun
          134.
          Update docs with YuniKorn 1.6.0 and first-time installation guide Sub-task Resolved Zhou JIANG
          135.
          Upgrade `kubernetes-client` to 6.13.4 and `log4j` to 2.24.2 Sub-task Resolved Dongjoon Hyun

          Activity

            People

              ZhouJIANG Zhou Jiang
              ZhouJIANG Zhou Jiang
              L. C. Hsieh L. C. Hsieh
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: