This is to be an umbrella issue in the path towards having multicloud EMR with whirr.
Some of the things that must happen towards multicloud EMR (as discussed in IRC):
- Hadoop deployment must be "rock solid"
- Submitting and monitoring an hadoop mapreduce job through whirr
- distcp from blobstore to hadoop/hbase cluster
- cli component for job submission and monitoring.
Some of the things that would be nice to have additionally:
- pig service
- hive service
- sqoop service
- regular+spot instances in EMR
- multistage provisioning (different cluster sizes for different phases)