Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9549

Not able to run pyspark in docker driver container on Yarn3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.1.2
    • None
    • yarn
    • None
    • Hadoop 3.1.1.3.1.0.0-78

      spark version 2.3.2.3.1.0.0-78

      Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211

      Server: Docker Engine - Community Version:          18.09.6

    Description

      I follow https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html to build up a spark docker image to run pyspark, there isn't a good document describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use below command to launch my simple python job:

      PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py

       

      in the test.py, it only simply collect the hostname from the executor, and check whether the python job run in a container or not.

      I found that the driver always run direct on the host, not run in the container, as a result we need to keep python version in docker image consistent with the nodemanager, this is meanless to use docker to package all the dependencies.

       

      The spark job can be run successfully, below is the std output:

      Log Type: stdout

      Log Upload Time: Tue May 14 02:07:06 +0000 2019

      Log Length: 141

      host.test.com

      False ============>going to print all the container names. [True, True, True, True, True, True, True, True, True]

      please see attached Dockfile and test.py

       

      Attachments

        1. test.py
          0.7 kB
          Jack Zhu
        2. Dockerfile
          2 kB
          Jack Zhu
        3. yarn-site.xml
          23 kB
          Jack Zhu

        Activity

          People

            Unassigned Unassigned
            smilehahohi Jack Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: