Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4585

unused and misleading configuration in hadoop-init

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 0.18.1
    • None
    • contrib/cloud
    • None

    Description

      src/contrib/ec2/bin/image/hadoop-init is appended to rc.local on all
      ec2 cluster boxes. This shell script generates the hadoop-site.xml
      configuration file. It starts with some default settings, which are
      used to populate the file. These defaults are then overwritten by the
      user data (from hadoop-ec2-env.sh) passed to the EC2 instance by
      launch-hadoop-master and launch-hadoop-slaves.

      This isn't a bug; setting variables in hadoop-ec2-env.sh does the
      right thing. However, it's dead and misleading code (well, it misled
      me) and running a test Hadoop job to figure out what's going on takes
      a little effort.

      Suggested change to hadoop-init:

      Remove these lines:

      # set defaults
      MAX_TASKS=3
      [ "$INSTANCE_TYPE" == "m1.large" ] && MAX_TASKS=6
      [ "$INSTANCE_TYPE" == "m1.xlarge" ] && MAX_TASKS=12
      
      MAX_MAP_TASKS=$MAX_TASKS
      MAX_REDUCE_TASKS=$MAX_TASKS
      

      Add a comment before the lines which access the user data:

      # get user data passed in by the ec2 instance launch
      wget -q -O - http://169.254.169.254/latest/user-data | tr ',' '\n' > /tmp/user-data
      source /tmp/user-data
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            karl Karl Lehenbauer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: