Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1817

Failed to execute spark-shell with kudu-spark2 package

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: client, spark
    • Labels:
      None

      Description

      Tried to run

      spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.1.0

      and failed with the below error message:

      :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

      Exception in thread "main" java.lang.RuntimeException: unresolved dependency: org.apache.kudu#kudu-spark2_2.11;1.1.0: java.text.ParseException: inconsistent module descriptor file found in 'https://repo1.maven.org/maven2/org/apache/kudu/kudu-spark2_2.11/1.1.0/kudu-spark2_2.11-1.1.0.pom': bad module name: expected='kudu-spark2_2.11' found='kudu-spark_2.10';

      at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1076)

      at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:294)

      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)

      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)

      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

      I checked http://repo1.maven.org/maven2/org/apache/kudu/kudu-spark2_2.11/1.1.0/kudu-spark2_2.11-1.1.0.pom.

      The artifactId in pom.xml is

      kudu-${spark.version.label}_${scala.binary.version}

      As the properties are

      <properties>
      <scala.version>2.10.4</scala.version>
      <compat.src>src/main/spark1</compat.src>
      <scala.binary.version>2.10</scala.binary.version>
      <spark.version.label>spark</spark.version.label>
      <top.dir>${project.basedir}/..</top.dir>
      <spark.version>1.6.1</spark.version>
      </properties>
      

      So it will be translated to `kafka-spark_2.10`.

      By checking the code, I think this is caused by how maven shade plugin generates pom file.

      During running

      mvn clean package -P spark2_2.11

      maven shade plugin will generate a dependency-reduced-pom.xml, which will be used later in the release pom file.

      In dependency-reduced-pom.xml, the shade plugin will only explicitly parse properties for all dependencies. However, it won't parse artifact or plugin configurations. So we see

      kudu-${spark.version.label}_${scala.binary.version}

      in the release.

      It will cause the problem when other applications try to load package because the properties values are for spark1 and profile values for spark2 won't be used when loading the package.

      As pom files are supposed to be static in maven, a quick fix will be creating two new modules (spark1 and spark2) to build them separately.

      Please let me know your comments. Thanks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                junhe Jun He
                Reporter:
                junhe Jun He
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: