Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
0.18.1
-
None
-
None
Description
src/contrib/ec2/bin/image/hadoop-init is appended to rc.local on all
ec2 cluster boxes. This shell script generates the hadoop-site.xml
configuration file. It starts with some default settings, which are
used to populate the file. These defaults are then overwritten by the
user data (from hadoop-ec2-env.sh) passed to the EC2 instance by
launch-hadoop-master and launch-hadoop-slaves.
This isn't a bug; setting variables in hadoop-ec2-env.sh does the
right thing. However, it's dead and misleading code (well, it misled
me) and running a test Hadoop job to figure out what's going on takes
a little effort.
Suggested change to hadoop-init:
Remove these lines:
# set defaults MAX_TASKS=3 [ "$INSTANCE_TYPE" == "m1.large" ] && MAX_TASKS=6 [ "$INSTANCE_TYPE" == "m1.xlarge" ] && MAX_TASKS=12 MAX_MAP_TASKS=$MAX_TASKS MAX_REDUCE_TASKS=$MAX_TASKS
Add a comment before the lines which access the user data:
# get user data passed in by the ec2 instance launch wget -q -O - http://169.254.169.254/latest/user-data | tr ',' '\n' > /tmp/user-data source /tmp/user-data