Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25026

Binary releases should contain some copy of compiled external integration modules

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 2.4.0
    • None
    • Build, Structured Streaming
    • None
    • Hide
      I noted that a user was having trouble running a Spark Streaming + Kafka app on Spark 2.3.0 standalone, running from a binary release. It claimed not to find Spark-Kafka-related classes. I was surprised then when I looked at the contents of release 2.3.1 and 2.2.2 and found that jars/ contained:

      spark-catalyst_2.11-2.3.1.jar
      spark-core_2.11-2.3.1.jar
      spark-graphx_2.11-2.3.1.jar
      spark-hive-thriftserver_2.11-2.3.1.jar
      spark-hive_2.11-2.3.1.jar
      spark-kubernetes_2.11-2.3.1.jar
      spark-kvstore_2.11-2.3.1.jar
      spark-launcher_2.11-2.3.1.jar
      spark-mesos_2.11-2.3.1.jar
      spark-mllib-local_2.11-2.3.1.jar
      spark-mllib_2.11-2.3.1.jar
      spark-network-common_2.11-2.3.1.jar
      spark-network-shuffle_2.11-2.3.1.jar
      spark-repl_2.11-2.3.1.jar
      spark-sketch_2.11-2.3.1.jar
      spark-sql_2.11-2.3.1.jar
      spark-streaming_2.11-2.3.1.jar
      spark-tags_2.11-2.3.1.jar
      spark-unsafe_2.11-2.3.1.jar
      spark-yarn_2.11-2.3.1.jar

      No spark-streaming-kafka or -sql modules. While I still feel I might be missing a reason for this, it really doesn't seem correct. Spark-Kafka apps won't work right now, and we ship other integrations for modules that are even off by default in the build.

      The make-distribution.sh script does not appear to try to copy these JARs. Shouldn't it?
      Show
      I noted that a user was having trouble running a Spark Streaming + Kafka app on Spark 2.3.0 standalone, running from a binary release. It claimed not to find Spark-Kafka-related classes. I was surprised then when I looked at the contents of release 2.3.1 and 2.2.2 and found that jars/ contained: spark-catalyst_2.11-2.3.1.jar spark-core_2.11-2.3.1.jar spark-graphx_2.11-2.3.1.jar spark-hive-thriftserver_2.11-2.3.1.jar spark-hive_2.11-2.3.1.jar spark-kubernetes_2.11-2.3.1.jar spark-kvstore_2.11-2.3.1.jar spark-launcher_2.11-2.3.1.jar spark-mesos_2.11-2.3.1.jar spark-mllib-local_2.11-2.3.1.jar spark-mllib_2.11-2.3.1.jar spark-network-common_2.11-2.3.1.jar spark-network-shuffle_2.11-2.3.1.jar spark-repl_2.11-2.3.1.jar spark-sketch_2.11-2.3.1.jar spark-sql_2.11-2.3.1.jar spark-streaming_2.11-2.3.1.jar spark-tags_2.11-2.3.1.jar spark-unsafe_2.11-2.3.1.jar spark-yarn_2.11-2.3.1.jar No spark-streaming-kafka or -sql modules. While I still feel I might be missing a reason for this, it really doesn't seem correct. Spark-Kafka apps won't work right now, and we ship other integrations for modules that are even off by default in the build. The make-distribution.sh script does not appear to try to copy these JARs. Shouldn't it?

    Attachments

      Activity

        People

          srowen Sean R. Owen
          srowen Sean R. Owen
          Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: