Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Spark-2.0.0
-
None
-
Maven (3.3.9) on Mac OS X
Description
We use `spark-submit --packages <maven-coordinates> ...` to run Spark with any of the Bahir extensions.
In order to perform a manual integration test of a Bahir code change developers have to build the respective Bahir module and then install it into their local Maven repository. Then, when running `spark-submit --packages <maven-coordinates> ...` Spark will use Ivy to resolve the given maven-coordinates in order add the necessary jar files to the classpath.
The first time Ivy encounters new maven coordinates, it will download them from the local or remote Maven repository. All consecutive times Ivy will just use the previously cached jar files based on group ID, artifact ID and version, but irrespective of creation time stamp.
This behavior is fine when using spark-submit with released versions of Spark packages. For continuous development and integration-testing however that Ivy caching behavior poses a problem.
To work around it developers have to clear the local Ivy cache each time they install a new version of a Bahir package into their local Maven repository and before the run spark-submit.
For example, to test a code change in module streaming-mqtt, we would have to do ...
mvn clean install -pl streaming-mqtt rm -rf ~/.ivy2/cache/org.apache.bahir/spark-streaming-mqtt_2.11/ ${SPARK_HOME}/bin/spark-submit \ --packages org.apache.bahir:spark-streaming-mqtt_2.11:2.0.0-SNAPSHOT \ test.py
Attachments
Issue Links
- links to