[SPARK-759] Change how we track AMI ids in the EC2 scripts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: EC2
Labels:
None

Description

I think we should change how we track AMI ids in the EC2 scripts.

I don't like the current approach of using a URL to track the latest AMI id for each major version number:

1. There's no versioning for the contents of these URLs or mechanism to look up old AMI ids. There's no repository or feed to watch to be notified when the AMI ids are updated.
2. Updating Spark to a point release can break things for some users. The Spark API is backwards-compatible in point releases, but user code that's linked against one release may not work when connecting to a cluster running a newer API-compatible version of Spark (it would work if users marked Spark as a 'provided' dependency and used something like Spark's `run` script to add the cluster's Spark JARs to the classpath).

This message from spark-users illustrates both problems:
https://groups.google.com/d/msg/spark-users/T-Ug8G03Ctk/qV-YfE6Ws8MJ

Patrick has a pull request that moved the AMIs into the `spark-ec2` GitHub repository: https://github.com/shivaram/spark-ec2/pull/1. Can we extract this idea and apply it in the next release?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Jun/13 23:35

Updated:: 26/Aug/13 10:08

Resolved:: 26/Aug/13 10:08