Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4854 Merge spark branch to trunk
  3. PIG-4903

Avoid add all spark dependency jars to SPARK_YARN_DIST_FILES and SPARK_DIST_CLASSPATH

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • spark
    • None

    Description

      There are some comments about bin/pig on https://reviews.apache.org/r/45667/#comment198955.

      ################# ADDING SPARK DEPENDENCIES ##################
      # Spark typically works with a single assembly file. However this
      # assembly isn't available as a artifact to pull in via ivy.
      # To work around this short coming, we add all the jars barring
      # spark-yarn to DIST through dist-files and then add them to classpath
      # of the executors through an independent env variable. The reason
      # for excluding spark-yarn is because spark-yarn is already being added
      # by the spark-yarn-client via jarOf(Client.Class)
      for f in $PIG_HOME/lib/*.jar; do
          if [[ $f == $PIG_HOME/lib/spark-assembly* ]]; then
              # Exclude spark-assembly.jar from shipped jars, but retain in classpath
              SPARK_JARS=${SPARK_JARS}:$f;
          else
              SPARK_JARS=${SPARK_JARS}:$f;
              SPARK_YARN_DIST_FILES=${SPARK_YARN_DIST_FILES},file://$f;
              SPARK_DIST_CLASSPATH=${SPARK_DIST_CLASSPATH}:\${PWD}/`basename $f`
          fi
      done
      CLASSPATH=${CLASSPATH}:${SPARK_JARS}
      
      export SPARK_YARN_DIST_FILES=`echo ${SPARK_YARN_DIST_FILES} | sed 's/^,//g'`
      export SPARK_JARS=${SPARK_YARN_DIST_FILES}
      export SPARK_DIST_CLASSPATH
      

      Here we first copy all spark dependency jar like spark-network-shuffle_2.10-1.6.1 jar to distcache(SPARK_YARN_DIST_FILES) then add them to the classpath of executor(SPARK_DIST_CLASSPATH). Actually we need not copy all these depency jar to SPARK_DIST_CLASSPATH because all these dependency jars are included in spark-assembly.jar and spark-assembly.jar is uploaded with the spark job.

      Attachments

        1. PIG-4903_5.patch
          3 kB
          liyunzhang
        2. PIG-4903_4.patch
          3 kB
          liyunzhang
        3. PIG-4903_3.patch
          3 kB
          liyunzhang
        4. PIG-4903_2.patch
          3 kB
          liyunzhang
        5. PIG-4903_1.patch
          3 kB
          liyunzhang
        6. PIG-4903.patch
          2 kB
          liyunzhang

        Activity

          People

            Unassigned Unassigned
            kellyzly liyunzhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: