Uploaded image for project: 'Bahir (Retired)'
  1. Bahir (Retired)
  2. BAHIR-38

Spark-submit does not use latest locally installed Bahir packages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Spark-2.0.0
    • Spark-2.0.0
    • Build
    • None
    • Maven (3.3.9) on Mac OS X

    Description

      We use `spark-submit --packages <maven-coordinates> ...` to run Spark with any of the Bahir extensions.

      In order to perform a manual integration test of a Bahir code change developers have to build the respective Bahir module and then install it into their local Maven repository. Then, when running `spark-submit --packages <maven-coordinates> ...` Spark will use Ivy to resolve the given maven-coordinates in order add the necessary jar files to the classpath.

      The first time Ivy encounters new maven coordinates, it will download them from the local or remote Maven repository. All consecutive times Ivy will just use the previously cached jar files based on group ID, artifact ID and version, but irrespective of creation time stamp.

      This behavior is fine when using spark-submit with released versions of Spark packages. For continuous development and integration-testing however that Ivy caching behavior poses a problem.

      To work around it developers have to clear the local Ivy cache each time they install a new version of a Bahir package into their local Maven repository and before the run spark-submit.

      For example, to test a code change in module streaming-mqtt, we would have to do ...

      mvn clean install -pl streaming-mqtt
      
      rm -rf ~/.ivy2/cache/org.apache.bahir/spark-streaming-mqtt_2.11/
      
      ${SPARK_HOME}/bin/spark-submit \
          --packages org.apache.bahir:spark-streaming-mqtt_2.11:2.0.0-SNAPSHOT \
          test.py
      

      Attachments

        Issue Links

          Activity

            People

              ckadner Christian Kadner
              ckadner Christian Kadner
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 4h
                  4h
                  Remaining:
                  Remaining Estimate - 4h
                  4h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified