But that would means a one more external file to manage. Also, boxgrinder suggest to use relative paths to the external files because the path structures are preserved in the target VM. Hence, my approach is more straight forward
That's ok, we have git
It also makes maintenance easier by separating the content of the file with how it is being managed.
The following should work, shouldn't it?
and with the script in the same directory than the appliance definition file.
these services shouldn't be started unless the HDFS has all needed directories in there.
This requirement is addressable:
- A script with a higher priority than the Apache Hadoop daemons that check if directories/users are initialized (optionally if it is first boot as well). The script that got recently checked in which initialize HDFS should help in that regard.
- Having HDFS related daemons set with a higher priorities so they start before the yarn ones.
Also, we don't know in advance how much memory a user would allot to the VM, thus it might be a dreadful sight trying to run all 6 of these heavy daemons to run in say 2GB of memory.
Well, this appliance only provides Apache Hadoop right now. So I am not sure for what a user would use it if not using Apache Hadoop
Also, Cloudera's demo vm has all the CDH's daemons starting on boot (so that includes, Apache HBase, Apache Oozie, Apache Zookeeper...) along with xfce and only recommend 3GB of ram. So by only having Apache Hadoop in this appliance, the memory requirements shouldn't be that big (I don't remember exactly so I don't want to say any number).
Note also I intended to provide this appliance as a base and have some other appliances inheriting from that one to make it easy to build appliances for different purposes. For instance an appliance that build a datanode for AWS versus another appliance for a developer wishing to have Apache Bigtop in a box where (s)he can try his code.