Whirr
  1. Whirr
  2. WHIRR-189

Hadoop on EC2 should use all available local storage

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: service/hadoop
    • Labels:
      None
    1. WHIRR-189.patch
      33 kB
      Andrew Bayer
    2. WHIRR-189.patch
      32 kB
      Andrew Bayer
    3. WHIRR-189.patch
      26 kB
      Tom White
    4. WHIRR-189.patch
      25 kB
      Tom White

      Issue Links

        Activity

        Hide
        Olivier Grisel added a comment - - edited

        FYI the local drive that has the most space on the m1.small instances is not mounted on / but on /media/ephemeral0

        [ec2-user@ip-10-203-86-244 ec2-user]$ df -h
        Filesystem Size Used Avail Use% Mounted on
        /dev/xvda1 7.9G 1.8G 6.1G 23% /
        tmpfs 840M 0 840M 0% /dev/shm
        /dev/xvda2 147G 6.4G 133G 5% /media/ephemeral0

        Both drives have the same (read) speed though:

        [ec2-user@ip-10-203-86-244 ec2-user]$ sudo /sbin/hdparm -tT /dev/xvda1
        /dev/xvda1:
        Timing cached reads: 3966 MB in 2.05 seconds = 1939.33 MB/sec
        Timing buffered disk reads: 372 MB in 3.02 seconds = 123.23 MB/sec

        [ec2-user@ip-10-203-86-244 ec2-user]$ sudo /sbin/hdparm -tT /dev/xvda2
        /dev/xvda2:
        Timing cached reads: 4442 MB in 2.00 seconds = 2222.59 MB/sec
        Timing buffered disk reads: 376 MB in 3.00 seconds = 125.19 MB/sec

        Show
        Olivier Grisel added a comment - - edited FYI the local drive that has the most space on the m1.small instances is not mounted on / but on /media/ephemeral0 [ec2-user@ip-10-203-86-244 ec2-user] $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.8G 6.1G 23% / tmpfs 840M 0 840M 0% /dev/shm /dev/xvda2 147G 6.4G 133G 5% /media/ephemeral0 Both drives have the same (read) speed though: [ec2-user@ip-10-203-86-244 ec2-user] $ sudo /sbin/hdparm -tT /dev/xvda1 /dev/xvda1: Timing cached reads: 3966 MB in 2.05 seconds = 1939.33 MB/sec Timing buffered disk reads: 372 MB in 3.02 seconds = 123.23 MB/sec [ec2-user@ip-10-203-86-244 ec2-user] $ sudo /sbin/hdparm -tT /dev/xvda2 /dev/xvda2: Timing cached reads: 4442 MB in 2.00 seconds = 2222.59 MB/sec Timing buffered disk reads: 376 MB in 3.00 seconds = 125.19 MB/sec
        Hide
        Adrian Cole added a comment -

        the jclouds NodeMetadata object has a hardware.volumes collection you could use to determine this stuff.

        Show
        Adrian Cole added a comment - the jclouds NodeMetadata object has a hardware.volumes collection you could use to determine this stuff.
        Hide
        Tom White added a comment -

        On EC2, I noticed that for a m1.large instance jclouds reports that there are two local volumes /dev/sdb, and /dev/sdc (for EBS-backed images), even though /dev/sdc is not present on the instance. This is explained by the second note on http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?instance-storage-concepts.html. We can use EC2TemplateOptions to map all the ephemeral devices (but it's odd that jclouds reports a device that isn't actually mapped). This will require EC2-specific code that knows about the emphemeral devices on each instance size - I wonder if jclouds abstracts this?

        Once the above is sorted out, I imagine the implementation would include all the non-boot-device volumes for its storage. HadoopConfigurationBuilder would set dfs.data.dir, dfs.name.dir, and mapred.local.dir to use all the volumes. And the volumes would need mounting/symlinking (and possibly formatting in the case of EC2) using scripts like (this code is based on code from the Python scripts):

        # TODO: make sure that mkfs.xfs is installed
        function prep_disk() {
          mount=$1
          device=$2
          automount=${3:-false}
        
          if [ $(mountpoint -q -x $device) ]; then
            echo "$device is mounted"
            if [ ! -d $mount ]; then
              echo "No mount"
              ln -s $(grep $device /proc/mounts | awk '{print $2}') $mount
            fi
          else
            echo "warning: ERASING CONTENTS OF $device"
            mkfs.xfs -f $device
            if [ ! -e $mount ]; then
              mkdir $mount
            fi
            mount -o defaults,noatime $device $mount
            if $automount ; then
              echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab
            fi
          fi
        }
        
        prep_disk /data1 /dev/sdb true
        prep_disk /data2 /dev/sdc true
        
        Show
        Tom White added a comment - On EC2, I noticed that for a m1.large instance jclouds reports that there are two local volumes /dev/sdb, and /dev/sdc (for EBS-backed images), even though /dev/sdc is not present on the instance. This is explained by the second note on http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?instance-storage-concepts.html . We can use EC2TemplateOptions to map all the ephemeral devices (but it's odd that jclouds reports a device that isn't actually mapped). This will require EC2-specific code that knows about the emphemeral devices on each instance size - I wonder if jclouds abstracts this? Once the above is sorted out, I imagine the implementation would include all the non-boot-device volumes for its storage. HadoopConfigurationBuilder would set dfs.data.dir, dfs.name.dir, and mapred.local.dir to use all the volumes. And the volumes would need mounting/symlinking (and possibly formatting in the case of EC2) using scripts like (this code is based on code from the Python scripts): # TODO: make sure that mkfs.xfs is installed function prep_disk() { mount=$1 device=$2 automount=${3:- false } if [ $(mountpoint -q -x $device) ]; then echo "$device is mounted" if [ ! -d $mount ]; then echo "No mount" ln -s $(grep $device /proc/mounts | awk '{print $2}') $mount fi else echo "warning: ERASING CONTENTS OF $device" mkfs.xfs -f $device if [ ! -e $mount ]; then mkdir $mount fi mount -o defaults,noatime $device $mount if $automount ; then echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab fi fi } prep_disk /data1 /dev/sdb true prep_disk /data2 /dev/sdc true
        Hide
        Adrian Cole added a comment -

        @tom I think this is a bug we'll have to test in jclouds. right now, we don't verify the volumes listed are actually available. you mind raising an issue on this? thnx.

        Show
        Adrian Cole added a comment - @tom I think this is a bug we'll have to test in jclouds. right now, we don't verify the volumes listed are actually available. you mind raising an issue on this? thnx.
        Hide
        Tom White added a comment -

        Thanks Adrian. I opened http://code.google.com/p/jclouds/issues/detail?id=529 for this.

        BTW does jclouds know which ephemeral devices are not mapped by default for each instance size? It would be nice to be able to say "map all devices".

        Show
        Tom White added a comment - Thanks Adrian. I opened http://code.google.com/p/jclouds/issues/detail?id=529 for this. BTW does jclouds know which ephemeral devices are not mapped by default for each instance size? It would be nice to be able to say "map all devices".
        Hide
        Tom White added a comment -

        Here's an initial patch for this. Hadoop on EC2 works. The idea is that there's a function called prepare_all_disks() which is passed volume information, and which creates directories /data0, /data1, etc for the service to use. This moves more cloud-specific code out of the scripts.

        Show
        Tom White added a comment - Here's an initial patch for this. Hadoop on EC2 works. The idea is that there's a function called prepare_all_disks() which is passed volume information, and which creates directories /data0, /data1, etc for the service to use. This moves more cloud-specific code out of the scripts.
        Hide
        Tibor Kiss added a comment -

        +1
        I made a try to this patch and it works on EC2.

        Show
        Tibor Kiss added a comment - +1 I made a try to this patch and it works on EC2.
        Hide
        Andrei Savu added a comment -

        Looks good. It seems like this patch incorporates WHIRR-328. Do we want that or we should search for a better way of avoiding EBS (e.g. by rewriting parts of the templateBuilder in Whirr)?

        Show
        Andrei Savu added a comment - Looks good. It seems like this patch incorporates WHIRR-328 . Do we want that or we should search for a better way of avoiding EBS (e.g. by rewriting parts of the templateBuilder in Whirr)?
        Hide
        Andrei Savu added a comment -

        Should we try to push this in 0.6.0? Looks good to me.

        Show
        Andrei Savu added a comment - Should we try to push this in 0.6.0? Looks good to me.
        Hide
        Adrian Cole added a comment -

        any way we can get an integration test for this? ex. verify through some hdfs call the size is correct?

        Show
        Adrian Cole added a comment - any way we can get an integration test for this? ex. verify through some hdfs call the size is correct?
        Hide
        Tom White added a comment -

        Refreshed patch. Needs some testing.

        Show
        Tom White added a comment - Refreshed patch. Needs some testing.
        Hide
        Stefan Ackermann added a comment -

        Please fix this issue. This is important, because using all the hard drives may improve performance on IO bound jobs considerably. After enabling the second hard drive for an EC2 m1.large machine, one of our programs improved its speed by 20% (total job time).

        So please fix this. Especially as there is already a patch, that just needs updating.

        Show
        Stefan Ackermann added a comment - Please fix this issue. This is important, because using all the hard drives may improve performance on IO bound jobs considerably. After enabling the second hard drive for an EC2 m1.large machine, one of our programs improved its speed by 20% (total job time). So please fix this. Especially as there is already a patch, that just needs updating.
        Hide
        Adrian Cole added a comment - - edited

        one note about the patch.

        The following is incorrect:
        + // However, you can only make that call if you know that the image is
        + // EBS-backed, and there is no way to find this out (before launch) in
        + // jclouds at present

        After you build a template, but before you create the nodes, you can check the image in the resolved template.

        For example, we are already doing something similar in

        BootstrapTemplate.setSpotInstancePriceIfSpecified

        basically, we can add BootstrapTemplate.mapEphemeralIfImageIsEBSBacked
        if (EC2ImagePredicates.rootDeviceType(EBS).apply(template.getImage())

        { template.getOptions().as(EC2TemplateOptions.class) .mapEphemeralDeviceToDeviceName("/dev/sdc", "ephemeral1"); }

        I think the above is the missing link in this patch. I hope it helps!

        Show
        Adrian Cole added a comment - - edited one note about the patch. The following is incorrect: + // However, you can only make that call if you know that the image is + // EBS-backed, and there is no way to find this out (before launch) in + // jclouds at present After you build a template, but before you create the nodes, you can check the image in the resolved template. For example, we are already doing something similar in BootstrapTemplate.setSpotInstancePriceIfSpecified basically, we can add BootstrapTemplate.mapEphemeralIfImageIsEBSBacked if (EC2ImagePredicates.rootDeviceType(EBS).apply(template.getImage()) { template.getOptions().as(EC2TemplateOptions.class) .mapEphemeralDeviceToDeviceName("/dev/sdc", "ephemeral1"); } I think the above is the missing link in this patch. I hope it helps!
        Hide
        Tom White added a comment -

        Stefan - thanks for reporting that number of 20% - I definitely agree it would be useful to get this patch in. It still needs some testing though.

        Adrian - that's really useful to know, and should help simplify the patch. Thanks!

        Show
        Tom White added a comment - Stefan - thanks for reporting that number of 20% - I definitely agree it would be useful to get this patch in. It still needs some testing though. Adrian - that's really useful to know, and should help simplify the patch. Thanks!
        Hide
        Andrew Bayer added a comment -

        So lemme see if I understand what more needs to be done to the patch. Does it just need that BootstrapTemplate change?

        Show
        Andrew Bayer added a comment - So lemme see if I understand what more needs to be done to the patch. Does it just need that BootstrapTemplate change?
        Hide
        Tom White added a comment -

        I think so - and some testing to check that additional storage is actually being used by Hadoop. Thanks for looking at this Andrew!

        Show
        Tom White added a comment - I think so - and some testing to check that additional storage is actually being used by Hadoop. Thanks for looking at this Andrew!
        Hide
        Andrew Bayer added a comment -

        Updated/modified patch to work on trunk. I also incorporated the EBS logic adrian mentioned, and removed the "don't use EBS by default" since we've got that logic, and I'm not really sure if we really want to break by default on t1.micros.

        I've tested with recipes/hadoop-yarn-ec2.properties, on both EBS and instance store, and in both cases, we got /mnt, symlinked at /data0 (and /data), and a new /data1 as well with /dev/sdc.

        Show
        Andrew Bayer added a comment - Updated/modified patch to work on trunk. I also incorporated the EBS logic adrian mentioned, and removed the "don't use EBS by default" since we've got that logic, and I'm not really sure if we really want to break by default on t1.micros. I've tested with recipes/hadoop-yarn-ec2.properties, on both EBS and instance store, and in both cases, we got /mnt, symlinked at /data0 (and /data), and a new /data1 as well with /dev/sdc.
        Hide
        Adrian Cole added a comment -

        you mind trying s/AWSEC2/EC2/g and see if it still works?

        Show
        Adrian Cole added a comment - you mind trying s/AWSEC2/EC2/g and see if it still works?
        Hide
        Andrew Bayer added a comment -

        Ah-ha - I forget what the specific reason I had for this switch originally was. May well have been just wanting to avoid more imports. new patch has it switched.

        Show
        Andrew Bayer added a comment - Ah-ha - I forget what the specific reason I had for this switch originally was. May well have been just wanting to avoid more imports. new patch has it switched.
        Hide
        Tom White added a comment -

        +1. Thanks for trying it out Andrew.

        Show
        Tom White added a comment - +1. Thanks for trying it out Andrew.
        Hide
        Andrew Bayer added a comment -

        Committed.

        Show
        Andrew Bayer added a comment - Committed.

          People

          • Assignee:
            Andrew Bayer
            Reporter:
            Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development