[SPARK-3398] Have spark-ec2 intelligently wait for specific cluster states - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: EC2
Labels:
None

Description

spark-ec2 currently has retry logic for when it tries to install stuff on a cluster and for when it tries to destroy security groups.

It would be better to have some logic that allows spark-ec2 to explicitly wait for when all the nodes in a cluster it is working on have reached a specific state.

Examples:

Wait for all nodes to be up
Wait for all nodes to be up and accepting SSH connections (then start installing stuff)
Wait for all nodes to be down
Wait for all nodes to be terminated (then delete the security groups)

Having a function in the spark_ec2.py script that blocks until the desired cluster state is reached would reduce the need for various retry logic. It would probably also eliminate the need for the --wait parameter.

Attachments

Issue Links

incorporates

SPARK-1751 spark ec2 scripts should check for SSh to be up

Resolved

is related to

SPARK-1574 ec2/spark_ec2.py should provide option to control number of attempts for ssh operations

Resolved

SPARK-5473 Expose SSH failures after status checks pass

Resolved

links to

[Github] Pull Request #3195 (nchammas)

Pull request 2339 (nchammas)

Activity

People

Assignee:: Nicholas Chammas

Reporter:: Nicholas Chammas

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Sep/14 04:55

Updated:: 22/Feb/15 21:36

Resolved:: 07/Oct/14 23:54