Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-759

Change how we track AMI ids in the EC2 scripts

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • EC2
    • None

    Description

      I think we should change how we track AMI ids in the EC2 scripts.

      I don't like the current approach of using a URL to track the latest AMI id for each major version number:

      1. There's no versioning for the contents of these URLs or mechanism to look up old AMI ids. There's no repository or feed to watch to be notified when the AMI ids are updated.
      2. Updating Spark to a point release can break things for some users. The Spark API is backwards-compatible in point releases, but user code that's linked against one release may not work when connecting to a cluster running a newer API-compatible version of Spark (it would work if users marked Spark as a 'provided' dependency and used something like Spark's `run` script to add the cluster's Spark JARs to the classpath).

      This message from spark-users illustrates both problems:
      https://groups.google.com/d/msg/spark-users/T-Ug8G03Ctk/qV-YfE6Ws8MJ

      Patrick has a pull request that moved the AMIs into the `spark-ec2` GitHub repository: https://github.com/shivaram/spark-ec2/pull/1. Can we extract this idea and apply it in the next release?

      Attachments

        Activity

          People

            Unassigned Unassigned
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: