Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32227

Bug in load-spark-env.cmd with Spark 3.0.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • Spark Shell
    • None
    • Windows 10

    Description

      spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

       

      How to reproduce:

      1) download spark 3.0.0 without hadoop and extract it

      2) put a file conf/spark-env.cmd with the following contents (paths are relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to change):

       

      SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
      SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
      SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
      SET SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*

       

      3) go to the bin directory and run pyspark.   You will get an error that log4j can't be found, etc. (reason: the environment was not loaded indeed, it doesn't see where hadoop with all its jars is).

       

      How to fix:

      just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will work.

      [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.

       

      What is the difference?

      I am not a good specialist in Windows batch, but doing a function

      :LoadSparkEnv
      if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
       call "%SPARK_CONF_DIR%\spark-env.cmd"
      )

      and then calling it (as it was in 2.4.3) helps.

       

       

      Attachments

        1. load-spark-env.cmd
          2 kB
          Ihor Bobak

        Activity

          People

            ibobak Ihor Bobak
            ibobak Ihor Bobak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: