Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32227

Bug in load-spark-env.cmd with Spark 3.0.0

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.1, 3.1.0
    • Component/s: Spark Shell
    • Labels:
      None
    • Environment:

      Windows 10

      Description

      spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

       

      How to reproduce:

      1) download spark 3.0.0 without hadoop and extract it

      2) put a file conf/spark-env.cmd with the following contents (paths are relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to change):

       

      SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
      SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
      SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
      SET SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*

       

      3) go to the bin directory and run pyspark.   You will get an error that log4j can't be found, etc. (reason: the environment was not loaded indeed, it doesn't see where hadoop with all its jars is).

       

      How to fix:

      just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will work.

      [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.

       

      What is the difference?

      I am not a good specialist in Windows batch, but doing a function

      :LoadSparkEnv
      if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
       call "%SPARK_CONF_DIR%\spark-env.cmd"
      )

      and then calling it (as it was in 2.4.3) helps.

       

       

        Attachments

        1. load-spark-env.cmd
          2 kB
          Ihor Bobak

          Activity

            People

            • Assignee:
              ibobak Ihor Bobak
              Reporter:
              ibobak Ihor Bobak
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: