This python script does start Amazon ec2|OpenStack instances to install java+hadoop and configure hdfs/yarn via puppet. In order to run FLINK on top of hadoop YARN.
In order to install java and hadoop binaries are downloaded by the script and handed over to puppet for automated provisioning.
User-data scripts are used to install puppet (only debian) on the master and slave instances. Accordingly security groups are created and configured.
The master instance then starts a self configuration process, so that the puppet modules are set up according to the cluster structure.
The master detects if hadoop YARN web interface is accessible and waits for all expected nodes to be up and running. Then a stratosphere yarn session is started. Taskmanager and Jobmanager memory allocations are set up in the instances.cfg.
- Configuration reserves 600mb for the operating system and allocates the rest for the YARN node.
- The Flink web interface is not accessible because the yarn.web.proxy throws a NullpointerException
- Only runs on Debian derivatives because it uses apt-get
- Tested with ubuntu-13.08
- FLINK is still named Stratosphere