The (old) Hadoop EC2 scripts we used as a basis for our scripts do not properly handle instance types that have their local storage attached as multiple volumes. The AMI build script uses Amazon's latest Fedora 8 AMI as a base, which does not automatically mount instance storage on multiple volumes either.
We recommend use of two instance types:
- c1.medium for zookeeper, which has one vdisk of 340 GB as /dev/sdb mounted on /mnt by the base image
- c1.xlarge for master and slaves, which have four vdisks of 420 GB as /dev/sd[bcde], only one of which is mounted on /mnt by the base image
Additionally, the m1.xlarge instance type, which a user might use anyway, has two vdisks of 420 GB as /dev/sd[bc], only one of which is mounted on /mnt by the base image.
The hbase-ec2-init-remote.sh script should probe for all available instance storage devices, mount them, and update the DataNode configuration appropriately.