Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
-
None
-
Windows 10
Description
spark-env.cmd which is located in conf is not loaded by load-spark-env.cmd.
How to reproduce:
1) download spark 3.0.0 without hadoop and extract it
2) put a file conf/spark-env.cmd with the following contents (paths are relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to change):
SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
SET SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
3) go to the bin directory and run pyspark. You will get an error that log4j can't be found, etc. (reason: the environment was not loaded indeed, it doesn't see where hadoop with all its jars is).
How to fix:
just take the load-spark-env.cmd from Spark version 2.4.3, and everything will work.
[UPDATE]: I attached a fixed version of load-spark-env.cmd that works fine.
What is the difference?
I am not a good specialist in Windows batch, but doing a function
:LoadSparkEnv
if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
call "%SPARK_CONF_DIR%\spark-env.cmd"
)
and then calling it (as it was in 2.4.3) helps.