Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34349

No python3 in docker images

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • 3.0.1
    • None
    • Kubernetes
    • None

    Description

      The spark-py container image doesn't receive the instruction to use python3 and defaults to python 2.7

       

      The worker container was build using the following commands

      mkdir ./tmp
      wget -qO- https://www.mirrorservice.org/sites/ftp.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz | tar -C ./tmp/ -xzf -
      cd ../spark-3.0.1-bin-hadoop3.2/
      ./bin/docker-image-tool.sh -r docker.io/timhughes -t spark-3.0.1-bin-hadoop3.2 -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
      docker push docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2

       

      This is the code I am using to initialize the workers

       

      import os
      from pyspark import SparkContext, SparkConf
      from pyspark.sql import SparkSession# Create Spark config for our Kubernetes based cluster manager
      sparkConf = SparkConf()
      sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
      sparkConf.setAppName("spark")
      sparkConf.set("spark.kubernetes.container.image", "docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2")
      sparkConf.set("spark.kubernetes.namespace", "spark")
      sparkConf.set("spark.executor.instances", "2")
      sparkConf.set("spark.executor.cores", "1")
      sparkConf.set("spark.driver.memory", "1024m")
      sparkConf.set("spark.executor.memory", "1024m")
      sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
      sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
      sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
      sparkConf.set("spark.driver.port", "29413")
      sparkConf.set("spark.driver.host", "my-notebook-deployment.spark.svc.cluster.local")
      # Initialize our Spark cluster, this will actually
      # generate the worker nodes.
      spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
      sc = spark.sparkContext
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            timhughes Tim Hughes
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: