Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4804

Unable to start Spark Interpreter on Kubernetes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • None
    • Kubernetes, spark
    • None

    Description

      Hi team,

      I'm trying to install Zeppelin (apache/zeppelin:0.9.0) on AWS EKS. When I try spinning up a Spark Interpreter pod (running just sc.version), it fails and says that there are no interpreters running. The pod's log includes the command:

      /opt/spark/bin/spark-submit class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer -driver-class-path ":/zeppelin/interpreter/spark/::/zeppelin/interpreter/zeppelin-interpreter-shaded-0.9.0-preview1.jar:/zeppelin/interpreter/spark/spark-interpreter-0.9.0-preview1.jar" --driver-java-options " -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///zeppelin/conf/log4j.properties -Dzeppelin.log.file='/zeppelin/logs/zeppelin-interpreter-spark-shared_processspark-refljc.log'" --conf spark.jars.ivy=/tmp/.ivy --master k8s://https://kubernetes.default.svc --deploy-mode client -driver-memory 1g --conf spark.kubernetes.namespace=default conf spark.executor.instances=1 --conf spark.kubernetes.driver.pod.name=spark-refljc --conf spark.kubernetes.container.image=stx-app-docker-prod-local.artifactory.dbgcloud.io/spark:2.4.5 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.host=spark-refljc.default.svc --conf spark.driver.port=22321 --conf spark.blockManager.port=22322 /zeppelin/interpreter/spark/spark-interpreter-0.9.0-preview1.jar zeppelin-server-695446f7c6-cxd59.default.svc 12320 "spark-shared_process" 12321:12321

       

      The default master address is being passed in (--master k8s://https://kubernetes.default.svc), but this shouldn't be - I've set the MASTER env variable to my cluster's API server and this is being overriden. I followed the stacktraceinto the code and found that the function BuildSparkSubmitOptions (https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L337) is being called, where the line 

      options.append(" --master k8s://https://kubernetes.default.svc");

      appears to hardcode this value in. Is this assumption correct? If so, it would explain why I'm not able to run Spark, as the Zeppelin pod isn't able to find the API server, since the default value is being hardcoded in. 

       

      I believe that in order for me to fix the issue myself, I would need to remove the line where the master is being set and re-build zeppelin, right? 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              varadskarmarkar Varad Karmarkar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: