Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
There are some comments about bin/pig on https://reviews.apache.org/r/45667/#comment198955.
################# ADDING SPARK DEPENDENCIES ################## # Spark typically works with a single assembly file. However this # assembly isn't available as a artifact to pull in via ivy. # To work around this short coming, we add all the jars barring # spark-yarn to DIST through dist-files and then add them to classpath # of the executors through an independent env variable. The reason # for excluding spark-yarn is because spark-yarn is already being added # by the spark-yarn-client via jarOf(Client.Class) for f in $PIG_HOME/lib/*.jar; do if [[ $f == $PIG_HOME/lib/spark-assembly* ]]; then # Exclude spark-assembly.jar from shipped jars, but retain in classpath SPARK_JARS=${SPARK_JARS}:$f; else SPARK_JARS=${SPARK_JARS}:$f; SPARK_YARN_DIST_FILES=${SPARK_YARN_DIST_FILES},file://$f; SPARK_DIST_CLASSPATH=${SPARK_DIST_CLASSPATH}:\${PWD}/`basename $f` fi done CLASSPATH=${CLASSPATH}:${SPARK_JARS} export SPARK_YARN_DIST_FILES=`echo ${SPARK_YARN_DIST_FILES} | sed 's/^,//g'` export SPARK_JARS=${SPARK_YARN_DIST_FILES} export SPARK_DIST_CLASSPATH
Here we first copy all spark dependency jar like spark-network-shuffle_2.10-1.6.1 jar to distcache(SPARK_YARN_DIST_FILES) then add them to the classpath of executor(SPARK_DIST_CLASSPATH). Actually we need not copy all these depency jar to SPARK_DIST_CLASSPATH because all these dependency jars are included in spark-assembly.jar and spark-assembly.jar is uploaded with the spark job.