Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19630

spark.jars.ivy explanation is incorrect and missleading.

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0
    • None
    • Documentation
    • None

    Description

      Following the spark documentation (http://spark.apache.org/docs/latest/configuration.html) I hoped that this property allows me to configure some custom repositories to be used by Spark-shell:

      _spark.jars.ivy Comma-separated list of additional remote repositories to search for the coordinates given with spark.jars.packages._

      Using e.g., this configuration:

      _spark.jars.ivy https://oss.sonatype.org/content/repositories/snapshots/,http://bits.netbeans.org/maven2/,https://maven.atlassian.com/repository/public/_

      leads to an error which initially makes not much sense to me:

      Ivy Default Cache set to: */sparkws/bin/*https:/oss.sonatype.org/content/repositories/snapshots/,http:/bits.netbeans.org/maven2/,https:/maven.atlassian.com/repository/public/cache
      The jars for the packages stored in: https:/oss.sonatype.org/content/repositories/snapshots/,http:/bits.netbeans.org/maven2/,https:/maven.atlassian.com/repository/public/jars
      Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: https:/oss.sonatype.org/content/repositories/snapshots/,http:/bits.netbeans.org/maven2/,https:/maven.atlassian.com/repository/public/local
      at org.apache.ivy.util.Checks.checkAbsolute(Checks.java:48)

      I run the spark-shell from folder /sparkws/bin => to me it looks like that the property points to a path in which Ivy caches the data locally, but the docs say something different: „Comma-separated list of additional remote repositories to search for the coordinates given with spark.jars.packages“.

      Could this be a „docs-bug“ eventually?

      Some more tests showed that in fact the setting:

      _spark.jars.ivy /sparkws/bin/jars _

      is used by Ivy to store the dynamically loaded JAR files, and not to provide the URL of remote repositories.

      Locally we have found this solution working:

      We added this CLI option to the spark-shell start command:

      --properties (COMMA SEPARATED LIST OF REPOS)

      This approach worked well.
      _Ivy Default Cache set to: /sparkws/bin/jars/cache
      The jars for the packages stored in: /sparkws/bin/jars/jars
      http://bits.netbeans.org/maven2/ added as a remote repository with the name: repo-1
      https://oss.sonatype.org/content/repositories/snapshots/ added as a remote repository with the name: repo-2_

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mirko.kaempf@cloudera.com Mirko Kaempf
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: