[ZEPPELIN-3020] Add support to run Spark interpreter on a Kubernetes cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0
Component/s: None
Labels:
None

Description

The goal of this PR - https://github.com/apache/zeppelin/pull/2637 - is to be able to execute Spark notebooks on Kubernetes in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses `spark-submit` to start RemoteInterpreterServer which is able to execute notebooks on Spark. Kubernetes specific `spark-submit` parameters like driver, executor, init container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment variable. In case the Spark interpreter is configured with a K8 Spark specific master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to the remote server. In a Kubernetes cluster the best solution for this is creating a K8S service for RemoteInterpreterServer. This is the reason for having the SparkK8RemoteInterpreterManagerProcess - extending functionality of RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping the port of RemoteInterpreterServer in Driver pod and connects to this service once Spark Driver pod is in Running state.

Attachments

Issue Links

duplicates

ZEPPELIN-3785 Add kubernetes scheduling to spark interpreter

Closed

links to

GitHub Pull Request #2637

Activity

People

Assignee:: Unassigned

Reporter:: Janos Matyas

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Oct/17 15:16

Updated:: 24/Dec/20 03:13

Resolved:: 14/Oct/20 13:58