Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16110

Can't set Python via spark-submit for YARN cluster mode when PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON are set

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.1
    • 2.1.0
    • PySpark, Spark Core, YARN
    • Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-38-generic x86_64), Spark 1.6.1, Azure HDInsight 3.4)

    Description

      When a cluster has PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables set (needed for using non-system Python e.g. /usr/bin/anaconda/bin/python), then you are unable to override this per submission in YARN cluster mode.

      When using spark-submit (in this case via LIVY) to submit with an override:
      spark-submit --master yarn --deploy-mode cluster --conf 'spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=python3' --conf' 'spark.yarn.appMasterEnv.PYSPARK_PYTHON=python3' probe.py
      the environment variable values will override the conf settings. A workaround for some can be to unset the env vars but that is not always possible (e.g. submitting batch via LIVY where you can only pass through the parameters to spark-submit).

      Expectation is that the conf values above override the environment variables.

      Fix is to change the order of application of conf and env vars in the yarn client.

      Related discussion:https://issues.cloudera.org/browse/LIVY-159

      Backporting this to 1.6 would be great and unblocking for me.

      Attachments

        Issue Links

          Activity

            People

              KevinGre Kevin Grealish
              KevinGre Kevin Grealish
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified