Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The goal of this PR - https://github.com/apache/zeppelin/pull/2637 - is to be able to execute Spark notebooks on Kubernetes in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses `spark-submit` to start RemoteInterpreterServer which is able to execute notebooks on Spark. Kubernetes specific `spark-submit` parameters like driver, executor, init container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment variable. In case the Spark interpreter is configured with a K8 Spark specific master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to the remote server. In a Kubernetes cluster the best solution for this is creating a K8S service for RemoteInterpreterServer. This is the reason for having the SparkK8RemoteInterpreterManagerProcess - extending functionality of RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping the port of RemoteInterpreterServer in Driver pod and connects to this service once Spark Driver pod is in Running state.
Attachments
Issue Links
- duplicates
-
ZEPPELIN-3785 Add kubernetes scheduling to spark interpreter
- Closed
- links to